Gateway

Catalyst Gateway is a transparent proxy that sits between your app and your LLM provider. Point your SDK at https://api.inference.net/v1, add a couple of headers, and every request is captured with cost, latency, and full request/response payloads. You keep your existing provider API keys. Gateway adds roughly 10ms of overhead and forwards requests as-is. Use Gateway for inference observability and training. Once traffic is flowing, you can:

Watch cost, latency, error rates, and token usage in Metrics Explorer
Browse individual requests in the Inference Viewer
Build datasets from traffic and run evals
Fine-tune a task-specific model on your captured data and deploy it on a dedicated GPU

Gateway and Catalyst Tracing are independent. Gateway captures one record per LLM request through a proxy. Tracing captures the full agent hierarchy from inside your code. Use them on their own or together. For traces, see the Tracing overview.

Getting Started

The fastest path is the Inference CLI:

inf instrument --mode gateway

gateway mode points your existing LLM clients at the Catalyst Gateway. both mode does that plus installs the tracing SDK in one pass. The command launches your choice of coding agent (Claude Code, OpenCode, or Codex) to make the edits. See Install with AI for the full flow. Prefer to wire it up by hand? Start with the Gateway quickstart or pick your provider below.

Supported Providers

OpenAI

Chat Completions and Responses API, including tool calls and structured outputs.

Anthropic

Messages API with tool use, prompt caching, and streaming.

Vertex AI

Google Cloud Vertex AI for Gemini and Anthropic models on GCP.

Google Gemini

Native Gemini API and the OpenAI-compatible Gemini endpoint.

OpenRouter

Route across many models through OpenRouter’s OpenAI-compatible API.

Cerebras

Cerebras inference with full request capture.

Groq

Groq’s low-latency inference endpoint.

LangChain

Use Gateway from LangChain by setting the base URL on the chat model.

ElevenLabs

Route ElevenLabs Agents’ LLM calls through Catalyst.

Don’t see your provider? Any OpenAI-compatible endpoint works through the x-inference-provider-url header. See supported OpenAI-compatible providers below.

Routing Headers

Header	Required	Description
`Authorization`	Yes	`Bearer <your-project-api-key>` authenticates the request to the gateway and links it to your project. For OpenAI-compatible SDKs, set this as the SDK’s `apiKey`.
`x-inference-provider-api-key`	Yes	Your provider API key or token, such as OpenAI, Groq, Gemini, or a Google Cloud Vertex credential. The gateway forwards it downstream. For Anthropic’s native SDK, use `x-api-key` instead.
`x-inference-provider`	No	Forces routing to a specific provider, such as `openai`, `anthropic`, `gemini`, `vertex-ai`, or `cerebras`. Usually inferred from the SDK, path, or `x-inference-provider-url`; set it only to override that inference.
`x-inference-environment`	No	Tags requests with an environment, such as `production` or `staging`.
`x-inference-task-id`	No	Groups requests under a logical task for filtering and analytics.
`x-inference-provider-url`	No	Routes to any OpenAI-compatible provider by specifying its base URL. For Vertex native APIs, set this to the global or regional `aiplatform.googleapis.com` base URL.

Supported OpenAI-compatible Provider URLs

Any OpenAI-compatible provider can be used via the x-inference-provider-url header, even when it does not have a dedicated guide in the catalog yet.

Provider	Base URL
OpenAI	`https://api.openai.com/v1`
OpenRouter	`https://openrouter.ai/api`
Anthropic	`https://api.anthropic.com/v1`
Google Gemini	`https://generativelanguage.googleapis.com` for native Gemini paths; `https://generativelanguage.googleapis.com/v1beta/openai` for OpenAI-compatible calls
Vertex AI	`https://aiplatform.googleapis.com/v1/projects/{project}/locations/global/endpoints/openapi`
Azure OpenAI	`https://{resource}.openai.azure.com/openai/deployments/{deployment}`
Groq	`https://api.groq.com/openai/v1`
Together AI	`https://api.together.xyz/v1`
Fireworks AI	`https://api.fireworks.ai/inference/v1`
Perplexity	`https://api.perplexity.ai`
Mistral	`https://api.mistral.ai/v1`
DeepSeek	`https://api.deepseek.com/v1`
Cerebras	`https://api.cerebras.ai/v1`
Inference.net	`https://api.inference.net/v1`

For regional Vertex AI endpoints, use https://{location}-aiplatform.googleapis.com/v1/projects/{project}/locations/{location}/endpoints/openapi instead of the global host.

What Gets Captured

Once traffic is flowing through Gateway, Catalyst records:

The full request and response payloads
Cost per call and aggregate spend
Latency, including time to first token (TTFT) and tokens per second
Token counts (input and output)
Error rates and status codes
Model and provider

These metrics are captured automatically without any changes to your request logic. TTFT and tokens-per-second are specifically the metrics that are invisible to application code; only a proxy can measure them.

Next Steps

Quickstart

Install with AI or wire it up by hand and capture your first LLM call.

Record your first LLM call

The higher-level Get Started flow with the same setup paths.

Metrics Explorer

Watch cost, latency, errors, and token usage across all your calls.

Tasks

Group LLM calls by feature or objective with x-inference-task-id.

Build a dataset

Turn captured traffic into datasets for evals and training.

Run an eval

Compare models side by side on your captured traffic.

Integrations

Traces

Gateway

Getting Started

Supported Providers

OpenAI

Anthropic

Vertex AI

Google Gemini

OpenRouter

Cerebras

Groq

LangChain

ElevenLabs

Routing Headers

Supported OpenAI-compatible Provider URLs

What Gets Captured

Next Steps

Quickstart

Record your first LLM call

Metrics Explorer

Tasks

Build a dataset

Run an eval

Integrations

Gateway

Traces

Documentation Index

​Getting Started

​Supported Providers

OpenAI

Anthropic

Vertex AI

Google Gemini

OpenRouter

Cerebras

Groq

LangChain

ElevenLabs

​Routing Headers

​Supported OpenAI-compatible Provider URLs

​What Gets Captured

​Next Steps

Quickstart

Record your first LLM call

Metrics Explorer

Tasks

Build a dataset

Run an eval

Getting Started

Supported Providers

Routing Headers

Supported OpenAI-compatible Provider URLs

What Gets Captured

Next Steps