Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.inference.net/llms.txt

Use this file to discover all available pages before exploring further.

ElevenLabs Agents lets you point an agent at a Custom LLM that speaks the OpenAI Chat Completions API. The Inference Catalyst gateway accepts that exact format and can forward to any supported provider, so you can run an ElevenLabs agent on Anthropic Claude (or any other model in the catalog) while keeping cost tracking, latency monitoring, and trace capture in one place.
This guide configures an ElevenLabs Agent to call Claude Sonnet 4.6 through the gateway.
ElevenLabs always sends OpenAI Chat Completions to the Custom LLM URL. To reach Claude, we route through Anthropic’s OpenAI-compatible endpoint using the x-inference-provider-url header — no schema translation needed on either side.

Setup

1

Get your API keys

You need two keys:
2

Open the agent's LLM settings

In the ElevenLabs dashboard, open your agent and click into the LLM panel. Set the LLM to Custom LLM.
3

Configure the Server URL

Under Server URL, leave the endpoint dropdown on Chat Completions and set the base URL to:
https://api.inference.net/v1
ElevenLabs appends /chat/completions automatically, so the final request goes to https://api.inference.net/v1/chat/completions.
4

Set the Model ID

In Model ID, enter the upstream model you want the gateway to route to:
claude-sonnet-4-6
You can swap this for any other supported model (e.g. gpt-4.1, claude-opus-4-6) without changing anything else.
5

Add your gateway API key

Under API Key, choose SecretAdd new secret and paste your Inference Catalyst project API key. ElevenLabs sends this as Authorization: Bearer <key>, which is how the gateway authenticates the request.
Do not paste your Anthropic key here. The Anthropic key is forwarded separately as a request header in the next step.
6

Add the routing headers

Under Request headers, click Add header and add the following entries. x-inference-provider-url tells the gateway to forward the chat-completions request to Anthropic’s OpenAI-compatible endpoint, and x-inference-provider-api-key carries the Anthropic key it should use upstream.
HeaderValue
x-inference-provider-urlhttps://api.anthropic.com
x-inference-provider-api-keyyour Anthropic API key
x-inference-environmentproduction
For the Anthropic key, click the {{ }} icon and reference an environment variable (e.g. {{ ANTHROPIC_API_KEY }}) rather than pasting the secret inline.
The provider URL must be the bare host (https://api.anthropic.com) — not https://api.anthropic.com/v1. The gateway appends /v1/chat/completions itself; including /v1 yields a 404.
Do not set x-inference-provider: anthropic here. That value targets Anthropic’s native /v1/messages shape, but ElevenLabs sends chat completions — the override URL above is what makes this work.
7

Tag requests with a Task ID

Add one more header so this agent’s calls are grouped under a dedicated task in your Catalyst dashboard. Tasks let you split out cost, latency, and traces per use case (e.g. one agent per task, or one task per tool flow).
HeaderValue
x-inference-task-idvoice-agent
Pick any stable string — it just needs to stay consistent across requests you want grouped together. If the task ID doesn’t already exist in your project, the gateway creates it on first use. To split traffic further (e.g. staging vs. production agents), use distinct IDs like voice-agent-prod and voice-agent-staging and filter by them in the dashboard.
8

Capture ElevenLabs system variables as metadata

ElevenLabs exposes per-conversation system variables (agent ID, conversation ID, caller ID, etc.) that can be injected into request headers. The Catalyst gateway treats any header prefixed with x-inference-metadata- as searchable metadata: the prefix is stripped and the rest becomes a key you can filter on in the dashboard. Combine the two and every LLM call is tagged with the ElevenLabs context it came from.For each variable you want to capture, click Add header, switch the Type tab to Variable, and configure:
Header nameDynamic variable
x-inference-metadata-agent-idsystem__agent_id
x-inference-metadata-conversation-idsystem__conversation_id
x-inference-metadata-caller-idsystem__caller_id
x-inference-metadata-called-numbersystem__called_number
x-inference-metadata-call-sidsystem__call_sid
x-inference-metadata-call-duration-secssystem__call_duration_secs
ElevenLabs substitutes the variable value into the header on every request, and the gateway stores the stripped key (agent-id, conversation-id, …) against the inference. In the Catalyst dashboard you can then filter by any of these — e.g. pull every LLM call from a single conversation by conversation-id, or cost-per-caller by caller-id — and use the same keys to slice datasets you build off the traffic.
The metadata key is just <header name minus the prefix>. Keep it short and lowercase-with-dashes — that’s how it’ll show up in the dashboard’s filter UI. Add only the variables you actually plan to filter or group by; extra metadata is free to send but adds noise.
9

Test the connection

Click Test Connection at the bottom of the LLM panel. ElevenLabs will issue a probe request through the gateway to Anthropic. A green status confirms the agent can reach Claude Sonnet 4.6 via Catalyst. Save the agent and start a session — the call will appear in your dashboard within a few seconds.

Equivalent cURL

If you want to verify the gateway path outside ElevenLabs, this is the same request the agent issues:
curl https://api.inference.net/v1/chat/completions \
  -H "Authorization: Bearer $INFERENCE_API_KEY" \
  -H "x-inference-provider-url: https://api.anthropic.com" \
  -H "x-inference-provider-api-key: $ANTHROPIC_API_KEY" \
  -H "x-inference-environment: production" \
  -H "x-inference-task-id: voice-agent" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4-6",
    "messages": [{"role": "user", "content": "Hello"}]
  }'