ElevenLabs Agents lets you point an agent at a Custom LLM that speaks the OpenAI Chat Completions API. The Inference Catalyst gateway accepts that exact format and can forward to any supported provider, so you can run an ElevenLabs agent on Anthropic Claude (or any other model in the catalog) while keeping cost tracking, latency monitoring, and trace capture in one place.Documentation Index
Fetch the complete documentation index at: https://docs.inference.net/llms.txt
Use this file to discover all available pages before exploring further.
ElevenLabs always sends OpenAI Chat Completions to the Custom LLM URL. To reach Claude, we route through Anthropic’s OpenAI-compatible endpoint using the
x-inference-provider-url header — no schema translation needed on either side.Setup
Get your API keys
You need two keys:
- Inference Catalyst project API key — from your dashboard under API Keys
- Anthropic API key — from your Anthropic console
Open the agent's LLM settings
In the ElevenLabs dashboard, open your agent and click into the LLM panel. Set the LLM to Custom LLM.
Configure the Server URL
Under Server URL, leave the endpoint dropdown on Chat Completions and set the base URL to:ElevenLabs appends
/chat/completions automatically, so the final request goes to https://api.inference.net/v1/chat/completions.Set the Model ID
In Model ID, enter the upstream model you want the gateway to route to:You can swap this for any other supported model (e.g.
gpt-4.1, claude-opus-4-6) without changing anything else.Add your gateway API key
Under API Key, choose Secret → Add new secret and paste your Inference Catalyst project API key. ElevenLabs sends this as
Authorization: Bearer <key>, which is how the gateway authenticates the request.Add the routing headers
Under Request headers, click Add header and add the following entries.
For the Anthropic key, click the
x-inference-provider-url tells the gateway to forward the chat-completions request to Anthropic’s OpenAI-compatible endpoint, and x-inference-provider-api-key carries the Anthropic key it should use upstream.| Header | Value |
|---|---|
x-inference-provider-url | https://api.anthropic.com |
x-inference-provider-api-key | your Anthropic API key |
x-inference-environment | production |
{{ }} icon and reference an environment variable (e.g. {{ ANTHROPIC_API_KEY }}) rather than pasting the secret inline.The provider URL must be the bare host (
https://api.anthropic.com) — not https://api.anthropic.com/v1. The gateway appends /v1/chat/completions itself; including /v1 yields a 404.Tag requests with a Task ID
Add one more header so this agent’s calls are grouped under a dedicated task in your Catalyst dashboard. Tasks let you split out cost, latency, and traces per use case (e.g. one agent per task, or one task per tool flow).
Pick any stable string — it just needs to stay consistent across requests you want grouped together. If the task ID doesn’t already exist in your project, the gateway creates it on first use. To split traffic further (e.g. staging vs. production agents), use distinct IDs like
| Header | Value |
|---|---|
x-inference-task-id | voice-agent |
voice-agent-prod and voice-agent-staging and filter by them in the dashboard.Capture ElevenLabs system variables as metadata
ElevenLabs exposes per-conversation system variables (agent ID, conversation ID, caller ID, etc.) that can be injected into request headers. The Catalyst gateway treats any header prefixed with
ElevenLabs substitutes the variable value into the header on every request, and the gateway stores the stripped key (
x-inference-metadata- as searchable metadata: the prefix is stripped and the rest becomes a key you can filter on in the dashboard. Combine the two and every LLM call is tagged with the ElevenLabs context it came from.For each variable you want to capture, click Add header, switch the Type tab to Variable, and configure:| Header name | Dynamic variable |
|---|---|
x-inference-metadata-agent-id | system__agent_id |
x-inference-metadata-conversation-id | system__conversation_id |
x-inference-metadata-caller-id | system__caller_id |
x-inference-metadata-called-number | system__called_number |
x-inference-metadata-call-sid | system__call_sid |
x-inference-metadata-call-duration-secs | system__call_duration_secs |
agent-id, conversation-id, …) against the inference. In the Catalyst dashboard you can then filter by any of these — e.g. pull every LLM call from a single conversation by conversation-id, or cost-per-caller by caller-id — and use the same keys to slice datasets you build off the traffic.Test the connection
Click Test Connection at the bottom of the LLM panel. ElevenLabs will issue a probe request through the gateway to Anthropic. A green status confirms the agent can reach Claude Sonnet 4.6 via Catalyst. Save the agent and start a session — the call will appear in your dashboard within a few seconds.