Integrate - Inference.net Documentation

Production LLM traffic serves as the backbone for model optimization in Catalyst. Catalyst Gateway records LLM traffic between your application and your current LLM provider, and stores it for later evaluation and training. There are two ways to capture production LLM traffic through Gateway: use the Inference CLI to automatically instrument your codebase, or wire it up manually. To get started with Catalyst, create a free account at inference.net.

Choose a setup path

Installing with AI is quickest. Use the manual flow if you want to review each change yourself.

Install with AI
Install manually

Use the Inference CLI to automatically initialize a coding agent like Claude Code to scan your codebase, update your LLM clients, and add required request metadata.

Install the CLI and authenticate

Install the Inference CLI globally and log in. Your browser will open to authenticate.

npm install -g @inference/cli && inf auth login

Run instrumentation in your project

Navigate to your project root and run instrumentation.

cd /path/to/your/project && inf instrument

The command guides you through the following workflow:

Select a coding agent to use: Claude Code, OpenCode, or Codex.
Scan your codebase for LLM clients such as OpenAI, Anthropic, LangChain,etc
Redirect base URLs to the gateway
Add routing headers so requests are authenticated, forwarded, and traced
Add task IDs so each call site is grouped automatically in the dashboard
Review the generated changes before applying them

Run inf instrument --dry-run to preview changes without modifying any files.

Run your app

Run your application how you normally would to produce inference requests. Requests from your application are now routed through the gateway and will appear in the dashboard.

View your results

Open the dashboard to see request details, traces, and analytics.

Want the full canonical guide for this workflow? See Install with AI.

Use this path if you want to review each change yourself. The example below uses OpenAI. For Anthropic, Cerebras, Groq, and other providers, see the Integrations guide.

Get your API keys

You need two keys:

Inference Catalyst project API key — from your dashboard under API Keys
OpenAI API key — from your OpenAI account

Set them as environment variables:

export INFERENCE_API_KEY=<your-project-api-key>
export OPENAI_API_KEY=<your-openai-api-key>

Update your code

Point your SDK at https://api.inference.net/v1 and use your Catalyst project API key as the SDK’s apiKey. Your provider’s API key goes in the x-inference-provider-api-key header so the gateway can forward it. The gateway adds roughly 10ms of latency and forwards your requests to the provider as-is.

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.inference.net/v1",
  apiKey: process.env.INFERENCE_API_KEY,
  defaultHeaders: {
    "x-inference-provider-api-key": process.env.OPENAI_API_KEY,
    "x-inference-provider": "openai",
  },
});

const response = await client.chat.completions.create({
  model: "gpt-4.1",
  messages: [{ role: "user", content: "Hello, world!" }],
});

console.log(response.choices[0].message.content);

Send a request

Run the snippet above from your application or terminal. Once the request completes, Catalyst will capture it automatically.

View your results

Open the Inference Catalyst dashboard to inspect the request, traces, and metrics.

Need provider-specific manual instructions? See Integrations Overview.

That’s it. Every request now flows through Catalyst and gets captured automatically.

What gets captured

Once traffic is flowing, Catalyst records:

The full request and response payloads
Cost per call and aggregate spend
Latency (end-to-end and time to first token)
Token counts (input and output)
Error rates and status codes
Model and provider

Where to find your data

Metrics Explorer - dashboards for cost, latency, errors, and usage across all your LLM calls
Inference Viewer - browse and filter individual requests and responses

Next steps

Connect more providers

Set up Anthropic, Cerebras, Groq, and other providers.

Organize with tasks

Group LLM calls by feature or objective to track metrics separately.

Build a dataset

Turn captured traffic into datasets for evals and training.

Upload a dataset

Already have data? Upload a JSONL file to start evaluating or training.

​Choose a setup path

​What gets captured

​Where to find your data

​Next steps

Connect more providers

Organize with tasks

Build a dataset

Upload a dataset

Choose a setup path

What gets captured

Where to find your data

Next steps