> ## Documentation Index
> Fetch the complete documentation index at: https://docs.inference.net/llms.txt
> Use this file to discover all available pages before exploring further.

# ElevenLabs

> Route an ElevenLabs Agent's LLM calls through Inference Catalyst to any supported model — including Anthropic Claude — for full observability.

ElevenLabs Agents lets you point an agent at a Custom LLM that speaks the OpenAI Chat Completions API. The Inference Catalyst gateway accepts that exact format and can forward to any supported provider, so you can run an ElevenLabs agent on Anthropic Claude (or any other model in the catalog) while keeping cost tracking, latency monitoring, and trace capture in one place.

<Frame>
  <iframe style={{ width: "100%", aspectRatio: "16 / 9", border: 0, display: "block" }} src="https://www.loom.com/embed/14fd2b514de846aba4723350cc7bb345" title="ElevenLabs + Inference Catalyst walkthrough" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowFullScreen />
</Frame>

This guide configures an ElevenLabs Agent to call **Claude Sonnet 4.6** through the gateway.

<Info>ElevenLabs always sends OpenAI Chat Completions to the Custom LLM URL. To reach Claude, we route through Anthropic's [OpenAI-compatible endpoint](https://platform.claude.com/docs/en/api/openai-sdk) using the `x-inference-provider-url` header — no schema translation needed on either side.</Info>

## Setup

<Steps>
  <Step title="Get your API keys">
    You need two keys:

    * **Inference Catalyst project API key** — from your [dashboard](https://inference.net/dashboard) under **API Keys**
    * **Anthropic API key** — from your [Anthropic console](https://console.anthropic.com/settings/keys)
  </Step>

  <Step title="Open the agent's LLM settings">
    In the ElevenLabs dashboard, open your agent and click into the **LLM** panel. Set the LLM to **Custom LLM**.
  </Step>

  <Step title="Configure the Server URL">
    Under **Server URL**, leave the endpoint dropdown on **Chat Completions** and set the base URL to:

    ```
    https://api.inference.net/v1
    ```

    ElevenLabs appends `/chat/completions` automatically, so the final request goes to `https://api.inference.net/v1/chat/completions`.
  </Step>

  <Step title="Set the Model ID">
    In **Model ID**, enter the upstream model you want the gateway to route to:

    ```
    claude-sonnet-4-6
    ```

    You can swap this for any other supported model (e.g. `gpt-4.1`, `claude-opus-4-6`) without changing anything else.
  </Step>

  <Step title="Add your gateway API key">
    Under **API Key**, choose **Secret** → **Add new secret** and paste your **Inference Catalyst project API key**. ElevenLabs sends this as `Authorization: Bearer <key>`, which is how the gateway authenticates the request.

    <Warning>Do not paste your Anthropic key here. The Anthropic key is forwarded separately as a request header in the next step.</Warning>
  </Step>

  <Step title="Add the routing headers">
    Under **Request headers**, click **Add header** and add the following entries. `x-inference-provider-url` tells the gateway to forward the chat-completions request to Anthropic's OpenAI-compatible endpoint, and `x-inference-provider-api-key` carries the Anthropic key it should use upstream.

    | Header                         | Value                       |
    | ------------------------------ | --------------------------- |
    | `x-inference-provider-url`     | `https://api.anthropic.com` |
    | `x-inference-provider-api-key` | *your Anthropic API key*    |
    | `x-inference-environment`      | `production`                |

    For the Anthropic key, click the `{{ }}` icon and reference an environment variable (e.g. `{{ ANTHROPIC_API_KEY }}`) rather than pasting the secret inline.

    <Note>The provider URL must be the **bare host** (`https://api.anthropic.com`) — not `https://api.anthropic.com/v1`. The gateway appends `/v1/chat/completions` itself; including `/v1` yields a 404.</Note>

    <Warning>Do **not** set `x-inference-provider: anthropic` here. That value targets Anthropic's native `/v1/messages` shape, but ElevenLabs sends chat completions — the override URL above is what makes this work.</Warning>
  </Step>

  <Step title="Tag requests with a Task ID">
    Add one more header so this agent's calls are grouped under a dedicated task in your Catalyst dashboard. Tasks let you split out cost, latency, and traces per use case (e.g. one agent per task, or one task per tool flow).

    | Header                | Value         |
    | --------------------- | ------------- |
    | `x-inference-task-id` | `voice-agent` |

    Pick any stable string — it just needs to stay consistent across requests you want grouped together. If the task ID doesn't already exist in your project, the gateway creates it on first use. To split traffic further (e.g. staging vs. production agents), use distinct IDs like `voice-agent-prod` and `voice-agent-staging` and filter by them in the dashboard.
  </Step>

  <Step title="Capture ElevenLabs system variables as metadata">
    ElevenLabs exposes per-conversation system variables (agent ID, conversation ID, caller ID, etc.) that can be injected into request headers. The Catalyst gateway treats any header prefixed with `x-inference-metadata-` as searchable metadata: the prefix is stripped and the rest becomes a key you can filter on in the dashboard. Combine the two and every LLM call is tagged with the ElevenLabs context it came from.

    For each variable you want to capture, click **Add header**, switch the **Type** tab to **Variable**, and configure:

    | Header name                               | Dynamic variable             |
    | ----------------------------------------- | ---------------------------- |
    | `x-inference-metadata-agent-id`           | `system__agent_id`           |
    | `x-inference-metadata-conversation-id`    | `system__conversation_id`    |
    | `x-inference-metadata-caller-id`          | `system__caller_id`          |
    | `x-inference-metadata-called-number`      | `system__called_number`      |
    | `x-inference-metadata-call-sid`           | `system__call_sid`           |
    | `x-inference-metadata-call-duration-secs` | `system__call_duration_secs` |

    ElevenLabs substitutes the variable value into the header on every request, and the gateway stores the stripped key (`agent-id`, `conversation-id`, …) against the inference. In the Catalyst dashboard you can then filter by any of these — e.g. pull every LLM call from a single conversation by `conversation-id`, or cost-per-caller by `caller-id` — and use the same keys to slice datasets you build off the traffic.

    <Tip>The metadata key is just `<header name minus the prefix>`. Keep it short and lowercase-with-dashes — that's how it'll show up in the dashboard's filter UI. Add only the variables you actually plan to filter or group by; extra metadata is free to send but adds noise.</Tip>
  </Step>

  <Step title="Test the connection">
    Click **Test Connection** at the bottom of the LLM panel. ElevenLabs will issue a probe request through the gateway to Anthropic. A green status confirms the agent can reach Claude Sonnet 4.6 via Catalyst. Save the agent and start a session — the call will appear in your [dashboard](https://inference.net/dashboard) within a few seconds.
  </Step>
</Steps>

## Equivalent cURL

If you want to verify the gateway path outside ElevenLabs, this is the same request the agent issues:

```bash theme={"system"}
curl https://api.inference.net/v1/chat/completions \
  -H "Authorization: Bearer $INFERENCE_API_KEY" \
  -H "x-inference-provider-url: https://api.anthropic.com" \
  -H "x-inference-provider-api-key: $ANTHROPIC_API_KEY" \
  -H "x-inference-environment: production" \
  -H "x-inference-task-id: voice-agent" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4-6",
    "messages": [{"role": "user", "content": "Hello"}]
  }'
```
