Skip to main content
The Inference MCP server lets compatible AI coding assistants query and operate Catalyst resources from your project. Use it to inspect projects, models, datasets, rubrics, evals, training jobs, deployments, inferences, traces, spans, and HALO agent-trace reports without leaving your MCP client. The most common workflow is applying HALO fixes: ask your assistant to apply the suggested fixes for an agent, and it pulls the HALO report and edits your code directly. See Example prompts.

How your API key scopes access

MCP authentication uses a project API key. The key is scoped to one or more projects within a team, and the MCP server forwards it to Catalyst APIs, where permissions are enforced. Authentication is by API key only. Because the key already identifies its project, you rarely need to name one. Every tool that takes a project treats it as optional and resolves it from the key automatically, so prompts like “list recent inferences” or “show the latest HALO runs” work without specifying a project.
  • Single-project key (most common): all tools target that project. You never pass a project.
  • Multi-project key: tools use the key’s default project unless you name a specific one. Mention the project in your prompt to switch.
  • Switching projects: to work in a project the key isn’t scoped to, use a different key — there is no in-session project switch. If you regularly work across projects, add the MCP server multiple times under different names, each with its own key (e.g. inference-prod, inference-staging), so all projects stay available at once.
Call the whoami tool at any time to see the current team, the projects the key can access, and the project it resolves to by default.
Read vs. write keys. A read-only key is the safer default: it can browse and read everything but cannot change anything, so the MCP server can never modify your resources. Use a key with write access only when you want MCP tools to make changes, such as creating datasets, running evals, launching training jobs, or changing deployments. Read and write scopes are set per project when you create the key.

Configure your client

1

Create or choose a project API key

Open API Keys in the dashboard and choose a key scoped to the project you want your MCP client to access.
2

Add the MCP server to your tool

Use https://mcp.inference.net/mcp as the server URL and pass your key in the Authorization header. Examples for common clients are below.
3

Reload and test

Restart or reload your MCP client, then ask it to run whoami to confirm the key is forwarded correctly and to see which project it resolves to.

Claude Code

Set INFERENCE_API_KEY in your shell, then add the hosted MCP server:
export INFERENCE_API_KEY="<your-project-api-key>"

claude mcp add --transport http inference https://mcp.inference.net/mcp \
  --header "Authorization: Bearer $INFERENCE_API_KEY"

Cursor

Add the server to .cursor/mcp.json for one project, or ~/.cursor/mcp.json for all projects.
{
  "mcpServers": {
    "inference": {
      "url": "https://mcp.inference.net/mcp",
      "headers": {
        "Authorization": "Bearer ${env:INFERENCE_API_KEY}"
      }
    }
  }
}

VS Code

Create or edit .vscode/mcp.json. VS Code will prompt for the key the first time it starts the server and store it securely.
{
  "inputs": [
    {
      "type": "promptString",
      "id": "inference-api-key",
      "description": "Inference API key",
      "password": true
    }
  ],
  "servers": {
    "inference": {
      "type": "http",
      "url": "https://mcp.inference.net/mcp",
      "headers": {
        "Authorization": "Bearer ${input:inference-api-key}"
      }
    }
  }
}

Windsurf

Edit ~/.codeium/windsurf/mcp_config.json and add the server:
{
  "mcpServers": {
    "inference": {
      "serverUrl": "https://mcp.inference.net/mcp",
      "headers": {
        "Authorization": "Bearer ${env:INFERENCE_API_KEY}"
      }
    }
  }
}

Codex

Set INFERENCE_API_KEY, then add the server with the Codex CLI:
export INFERENCE_API_KEY="<your-project-api-key>"

codex mcp add inference \
  --url https://mcp.inference.net/mcp \
  --bearer-token-env-var INFERENCE_API_KEY

OpenCode

Add the server under mcp in your OpenCode config:
{
  "$schema": "https://opencode.ai/config.json",
  "mcp": {
    "inference": {
      "type": "remote",
      "url": "https://mcp.inference.net/mcp",
      "enabled": true,
      "headers": {
        "Authorization": "Bearer <your-project-api-key>"
      }
    }
  }
}

Gemini CLI

Add the server to ~/.gemini/settings.json or .gemini/settings.json:
{
  "mcpServers": {
    "inference": {
      "httpUrl": "https://mcp.inference.net/mcp",
      "headers": {
        "Authorization": "Bearer <your-project-api-key>"
      }
    }
  }
}

Other clients

Use Streamable HTTP with this URL and header. The client must support custom headers; OAuth-only or SSE-only clients are not supported.
URL: https://mcp.inference.net/mcp
Authorization: Bearer <your-project-api-key>

Example prompts

You drive the MCP server in natural language. Your assistant picks the right tools and resolves the project from your key, so you can describe what you want rather than name tools or IDs.

Apply HALO fixes for an agent

HALO analyzes an agent’s traces and writes a report with suggested fixes. To apply them, name the agent in plain language:
Apply the HALO suggested fixes for the customer-support-agent.
Your assistant resolves the agent name to its identity, finds the most recent HALO conversation for it, pulls the report, and applies the changes in your codebase. If you don’t name an agent:
Apply the latest HALO fixes.
it lists recent HALO reports for the project and asks which one you mean before editing.

Run HALO on demand or on a schedule

If there’s no recent report, ask your assistant to run HALO now, then apply the result once it finishes:
Run HALO on the customer-support-agent over the last 24 hours, then apply the fixes.
You can also manage recurring HALO schedules:
Create a daily HALO schedule for the customer-support-agent at 9am UTC.
Pause the HALO schedule for the customer-support-agent.
Starting a HALO run and creating a schedule consume credits and compute, and require a write-scoped key. A schedule fires on its cadence until you pause or archive it. Your assistant should confirm the agent, time window, and prompt before starting.

More examples

What project is this key scoped to?
List the agents in this project and their execution counts.
Show me the latest HALO runs.
Summarize the errors in the last 50 inferences.
Find the slowest traces from today.
List my training jobs and their status.
Create an eval dataset from successful traffic over the last week.
Read actions (listing, summarizing, reading reports) work with a read-only key. Write actions (creating datasets, running evals, launching training jobs, changing deployments) need a key with write access.

Troubleshooting

ErrorWhat it meansHow to fix
401The Authorization header is missing or the API key is invalid.Check that the header is Authorization: Bearer <your-project-api-key> and that the key starts with sk-inference-.
403The key is valid but does not have the permission needed for the tool.Use a key with the required read or write scope for that action.
Project ID is requiredThe key is scoped to multiple projects and has no default, so a tool could not pick one.Name the project in your prompt, or run whoami to see the key’s projects.
Project not foundThe requested project is outside the key’s scope.Use a key scoped to that project; a single key cannot reach projects it isn’t scoped to.
Keep project API keys out of source control. Prefer your MCP client’s secret storage or environment-variable support when available.