This page is the gateway-focused quickstart. Point your SDK atDocumentation Index
Fetch the complete documentation index at: https://docs.inference.net/llms.txt
Use this file to discover all available pages before exploring further.
https://api.inference.net/v1, add a couple of headers, and Catalyst captures every request with cost, latency, and full request/response payloads. If you’d rather see the higher-level Get Started flow, start with Record your first LLM call.
The example below uses OpenAI. For other providers (Anthropic, Vertex AI, Gemini, OpenRouter, Cerebras, Groq, LangChain, ElevenLabs), see the Gateway overview.
Choose a setup path
Installing with AI is the quickest. Use the manual flow if you want to wire it up yourself.- Install with AI
- Install manually
Use the Inference CLI to launch a coding agent like Claude Code, OpenCode, or Codex to scan your codebase, update your LLM clients, and add the routing headers.
Install the CLI and authenticate
Install the Inference CLI globally and log in. Your browser will open to authenticate.
Run gateway instrumentation in your project
From your project root, run instrumentation in gateway mode.The command guides you through the following workflow:
- Select a coding agent: Claude Code, OpenCode, or Codex.
- Scan your codebase for LLM clients such as OpenAI, Anthropic, LangChain, etc.
- Redirect base URLs to the Catalyst Gateway.
- Add routing headers so requests are authenticated, forwarded, and tagged.
- Add task IDs so each call site is grouped automatically in the dashboard.
- Review the generated changes before applying them.
Run your app
Run your application how you normally would. Requests now flow through Gateway and appear in the dashboard.
View your results
Open the dashboard to see request details and analytics.
Want the full canonical guide for this workflow? See Install with AI.
What gets captured
Once traffic is flowing, Catalyst records:- The full request and response payloads
- Cost per call and aggregate spend
- Latency, including time to first token (TTFT) and tokens per second
- Token counts (input and output)
- Error rates and status codes
- Model and provider
Where to find your data
- Metrics Explorer for cost, latency, errors, and usage across all your LLM calls
- Inference Viewer to browse and filter individual requests and responses
Next steps
Gateway overview
Routing headers, supported providers, and the full set of OpenAI-compatible base URLs.
Connect more providers
Set up Anthropic, Vertex AI, Gemini, OpenRouter, Cerebras, Groq, and more.
Organize with tasks
Group LLM calls by feature or objective to track metrics separately.
Build a dataset
Turn captured traffic into datasets for evals and training.