> ## Documentation Index
> Fetch the complete documentation index at: https://docs.inference.net/llms.txt
> Use this file to discover all available pages before exploring further.

# Gateway

> Route LLM traffic through the Catalyst Gateway with a one-line base URL change. Capture requests, watch usage, build datasets, and run evals on the same traffic.

Catalyst Gateway is a transparent proxy that sits between your app and your LLM provider. Point your SDK at `https://api.inference.net/v1`, add a couple of headers, and every request is captured with cost, latency, and full request/response payloads. You keep your existing provider API keys. Gateway adds roughly 10ms of overhead and forwards requests as-is.

Use Gateway for inference observability and training. Once traffic is flowing, you can:

* Watch cost, latency, error rates, and token usage in [Metrics Explorer](/platform/gateway/metrics-explorer)
* Browse individual requests in the [Inference Viewer](/platform/gateway/inference-viewer)
* Build [datasets from traffic](/platform/datasets/build-from-traffic) and run [evals](/platform/eval/overview)
* [Fine-tune a task-specific model](/platform/train/overview) on your captured data and [deploy](/platform/deploy/overview) it on a dedicated GPU

<Info>
  Gateway and Catalyst Tracing are independent. Gateway captures one record per LLM request through a proxy. Tracing captures the full agent hierarchy from inside your code. Use them on their own or together. For traces, see the [Tracing overview](/integrations/traces/overview).
</Info>

## Getting Started

The fastest path is the [Inference CLI](/cli/overview):

<Metadata text="integrations/gateway/overview-inf-instrument" />

```bash theme={"system"}
inf instrument --mode gateway
```

`gateway` mode points your existing LLM clients at the Catalyst Gateway. `both` mode does that plus installs the tracing SDK in one pass. The command launches your choice of coding agent (Claude Code, OpenCode, or Codex) to make the edits. See [Install with AI](/integrations/install-with-ai) for the full flow.

Prefer to wire it up by hand? Start with the [Gateway quickstart](/integrations/gateway/quickstart) or pick your provider below.

## Officially Supported Providers

These providers have dedicated guides with copy-paste setup. Any other OpenAI-compatible provider works too via the `x-inference-provider-url` header. See [supported OpenAI-compatible providers](#supported-openai-compatible-provider-urls) below.

* [OpenAI](/integrations/model-providers/openai): Chat Completions and Responses API, including tool calls and structured outputs.
* [Anthropic](/integrations/model-providers/anthropic): Messages API with tool use, prompt caching, and streaming.
* [Vertex AI](/integrations/model-providers/vertex-ai): Google Cloud Vertex AI for Gemini and Anthropic models on GCP.
* [Google Gemini](/integrations/model-providers/google-gemini): Native Gemini API and the OpenAI-compatible Gemini endpoint.
* [OpenRouter](/integrations/model-providers/openrouter): Route across many models through OpenRouter's OpenAI-compatible API.
* [Cerebras](/integrations/model-providers/cerebras): Cerebras inference with full request capture.
* [Groq](/integrations/model-providers/groq): Groq's low-latency inference endpoint.
* [LangChain](/integrations/frameworks/langchain): Use Gateway from LangChain by setting the base URL on the chat model.
* [ElevenLabs](/integrations/agent-platforms/elevenlabs): Route ElevenLabs Agents' LLM calls through Catalyst.

## Routing Headers

| Header                         | Required | Description                                                                                                                                                                                                             |
| ------------------------------ | -------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `Authorization`                | Yes      | `Bearer <your-project-api-key>` authenticates the request to the gateway and links it to your project. For OpenAI-compatible SDKs, set this as the SDK's `apiKey`.                                                      |
| `x-inference-provider-api-key` | Yes      | Your provider API key or token, such as OpenAI, Groq, Gemini, or a Google Cloud Vertex credential. The gateway forwards it downstream. For Anthropic's native SDK, use `x-api-key` instead.                             |
| `x-inference-provider`         | No       | Forces routing to a specific provider, such as `openai`, `anthropic`, `gemini`, `vertex-ai`, or `cerebras`. Usually inferred from the SDK, path, or `x-inference-provider-url`; set it only to override that inference. |
| `x-inference-environment`      | No       | Tags requests with an environment, such as `production` or `staging`.                                                                                                                                                   |
| `x-inference-task-id`          | No       | Groups requests under a logical task for filtering and analytics.                                                                                                                                                       |
| `x-inference-provider-url`     | No       | Routes to any OpenAI-compatible provider by specifying its base URL. For Vertex native APIs, set this to the global or regional `aiplatform.googleapis.com` base URL.                                                   |

## Supported OpenAI-compatible Provider URLs

Any OpenAI-compatible provider can be used via the `x-inference-provider-url` header, even when it does not have a dedicated guide in the catalog yet.

| Provider      | Base URL                                                                                                                                                   |
| ------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------- |
| OpenAI        | `https://api.openai.com/v1`                                                                                                                                |
| OpenRouter    | `https://openrouter.ai/api`                                                                                                                                |
| Anthropic     | `https://api.anthropic.com/v1`                                                                                                                             |
| Google Gemini | `https://generativelanguage.googleapis.com` for native Gemini paths; `https://generativelanguage.googleapis.com/v1beta/openai` for OpenAI-compatible calls |
| Vertex AI     | `https://aiplatform.googleapis.com/v1/projects/{project}/locations/global/endpoints/openapi`                                                               |
| Azure OpenAI  | `https://{resource}.openai.azure.com/openai/deployments/{deployment}`                                                                                      |
| Groq          | `https://api.groq.com/openai/v1`                                                                                                                           |
| Together AI   | `https://api.together.xyz/v1`                                                                                                                              |
| Fireworks AI  | `https://api.fireworks.ai/inference/v1`                                                                                                                    |
| Perplexity    | `https://api.perplexity.ai`                                                                                                                                |
| Mistral       | `https://api.mistral.ai/v1`                                                                                                                                |
| DeepSeek      | `https://api.deepseek.com/v1`                                                                                                                              |
| Cerebras      | `https://api.cerebras.ai/v1`                                                                                                                               |
| Inference.net | `https://api.inference.net/v1`                                                                                                                             |

For regional Vertex AI endpoints, use `https://{location}-aiplatform.googleapis.com/v1/projects/{project}/locations/{location}/endpoints/openapi` instead of the global host.

## What Gets Captured

Once traffic is flowing through Gateway, Catalyst records:

* The full request and response payloads
* Cost per call and aggregate spend
* Latency, including time to first token (TTFT) and tokens per second
* Token counts (input and output)
* Error rates and status codes
* Model and provider

These metrics are captured automatically without any changes to your request logic. TTFT and tokens-per-second are specifically the metrics that are invisible to application code; only a proxy can measure them.

## Next Steps

<CardGroup cols={2}>
  <Card title="Quickstart" icon="rocket" href="/integrations/gateway/quickstart">
    Install with AI or wire it up by hand and capture your first LLM call.
  </Card>

  <Card title="Record your first LLM call" icon="bolt" href="/get-started/record-first-call">
    The higher-level Get Started flow with the same setup paths.
  </Card>

  <Card title="Metrics Explorer" icon="chart-line" href="/platform/gateway/metrics-explorer">
    Watch cost, latency, errors, and token usage across all your calls.
  </Card>

  <Card title="Tasks" icon="bullseye" href="/platform/gateway/tasks">
    Group LLM calls by feature or objective with `x-inference-task-id`.
  </Card>

  <Card title="Build a dataset" icon="database" href="/platform/datasets/build-from-traffic">
    Turn captured traffic into datasets for evals and training.
  </Card>

  <Card title="Run an eval" icon="flask" href="/platform/eval/overview">
    Compare models side by side on your captured traffic.
  </Card>
</CardGroup>
