Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.inference.net/llms.txt

Use this file to discover all available pages before exploring further.

The Inference CLI is the fastest way to connect your app to Catalyst. It scans your codebase, finds your LLM clients, and either routes them through the gateway, adds trace collection, or both — guided by an AI coding agent (Claude Code, OpenCode, or Codex). Want your AI coding assistant to query Catalyst resources directly? Configure the MCP server.
Install with AI works with OpenAI, Anthropic, Gemini, Vertex AI, Groq, Cerebras, OpenRouter, LangChain and more.

What it sets up

When you run inf instrument, you’ll be asked which Catalyst product to set up:
ModeWhat it does
gatewayRoutes every LLM call through the Catalyst proxy so cost, latency, and usage telemetry land in Observability.
tracingAdds the Catalyst tracing SDK and instruments your agents, tools, and provider calls with span trees.
bothSets up gateway routing first, then trace collection — in a single agent session.
Pass --mode gateway, --mode tracing, or --mode both to skip the prompt (useful in CI or scripted runs).
1

Install the CLI

Install the Inference CLI globally.
npm install -g @inference/cli
2

Sign in

Sign in with your Inference account. Your browser will open to authenticate.
inf auth login
Running in CI or another headless environment? Use inf auth set-key instead of browser login.
3

Run instrumentation in your project

Navigate to your project root and run instrumentation.
cd /path/to/your/project && inf instrument
The command guides you through the following workflow:
  • Pick what to instrument: gateway, tracing, or both.
  • Select a coding agent to use: Claude Code, OpenCode, or Codex.
  • Scan your codebase for LLM clients such as OpenAI, Anthropic, LangChain, etc.
  • For gateway: redirect base URLs to the proxy, add routing and task ID headers.
  • For tracing: install the Catalyst tracing SDK, initialize it at the app entrypoint, and add spans around agents, tools, and provider calls.
  • Review the generated changes before applying them.
Run inf instrument --dry-run to preview changes without modifying any files.
4

Run your app

Run your application how you normally would to produce inference requests. Requests from your application are now routed through the gateway and will appear in the dashboard.
5

Verify it worked

Open the dashboard to see request details, traces, and analytics. You can also verify from the CLI:
inf inference list
Add INFERENCE_API_KEY to your .env file so the instrumentation works across environments. Find your key in the dashboard under API Keys.

Supported AI coding agents

AgentBinary
Claude Codeclaude
OpenCodeopencode
Codexcodex

Supported providers

Built-in: OpenAI, Anthropic OpenAI-compatible via x-inference-provider-url: Google Gemini, Vertex AI, Together AI, Groq, Fireworks AI, Mistral AI, Cerebras, Perplexity, DeepSeek, OpenRouter, Azure OpenAI, and any OpenAI-compatible endpoint. Native provider APIs: Vertex AI native Gemini and Anthropic-on-Vertex are supported through the manual gateway headers documented in the Vertex AI guide.