- Call an open-source model — run models hosted on Inference.net directly.
- Proxy through Catalyst — route requests to any provider (OpenAI, Anthropic, etc.) through the Catalyst gateway for observability, evals, and cost tracking.
- Call your custom model — hit a model you’ve fine-tuned and deployed on the platform.
Get an API Key
Create an account
Visit inference.net and create an account.
Create an API key
On the dashboard, go to the API Keys tab in the left sidebar. Create a new key or use the default key.
1. Call an Open-Source Model
Run open-source models hosted on Inference.net. No provider API key needed — just your Inference API key. Browse available models at inference.net/models.2. Proxy Through Catalyst
Route requests to any LLM provider (OpenAI, Anthropic, Groq, etc.) through the Catalyst gateway. You keep your existing provider API key — the gateway adds observability, cost tracking, and eval-readiness with roughly 10ms of added latency. Your Inference project API key authenticates with the gateway. Your provider API key is forwarded to the provider via thex-inference-provider-api-key header.
For detailed setup guides per provider (Anthropic, Groq, Cerebras, OpenRouter, and more), see the Integrations docs.
3. Call Your Custom Model
Hit a model you’ve fine-tuned and deployed on Inference.net. The model path is your team slug followed by the deployment name, shown on your deployment’s detail page in the dashboard.Learn more about deploying models in the Deploy docs.
Headers Reference
| Header | Required | Description |
|---|---|---|
Authorization | Yes | Bearer <your-api-key> — authenticates the request. For OpenAI-compatible SDKs, set this as the SDK’s apiKey. |
Content-Type | Yes | Must be application/json. |
x-inference-provider | Proxy only | Routes the request to the correct provider: openai, anthropic, groq, cerebras, etc. |
x-inference-provider-api-key | Proxy only | Your provider’s API key. The gateway forwards it downstream. For Anthropic’s native SDK, use x-api-key instead. |
x-inference-provider-url | No | Routes to any OpenAI-compatible provider by base URL, even if it doesn’t have a dedicated integration. |
x-inference-environment | No | Tags requests with an environment label, such as production or staging. |
x-inference-task-id | No | Groups requests under a logical task for filtering and analytics in the dashboard. |
x-inference-metadata-* | No | Attach arbitrary metadata to a request. The prefix is stripped to form the key — e.g., x-inference-metadata-chat-id: abc123 stores chat-id: abc123. You can filter inferences and create datasets based on these keys in the dashboard. |
Supported Request Parameters
The API supports the standard OpenAI chat completions parameters:| Parameter | Type | Description |
|---|---|---|
model | string | The model to use. |
messages | array | The conversation messages. |
stream | boolean | Whether to stream the response. |
max_tokens | integer | Maximum number of tokens to generate. |
temperature | number | Sampling temperature (0–2). |
top_p | number | Nucleus sampling threshold. |
frequency_penalty | number | Penalizes repeated tokens based on frequency. |
presence_penalty | number | Penalizes tokens based on whether they’ve appeared. |
response_format | object | Set to {"type": "json_object"} or a JSON schema for structured outputs. |
tools | array | Tool/function definitions for function calling. |
Need a parameter that isn’t listed here? Contact us and we’ll add it.
Next Steps
Integrations
Set up Catalyst with OpenAI, Anthropic, Groq, and other providers.
Structured Outputs
Get typed JSON responses from your API calls.
Batch Processing
Process up to 50,000 requests in a single batch job.
Browse Models
Explore all models available on Inference.net.