API Quickstart - Inference.net Documentation

The Inference.net API is OpenAI-compatible, so you can use the OpenAI SDK or plain HTTP to make requests. There are three ways to use it:

Call an open-source model — run models hosted on Inference.net directly.
Proxy through Catalyst — route requests to any provider (OpenAI, Anthropic, etc.) through the Catalyst gateway for observability, evals, and cost tracking.
Call your custom model — hit a model you’ve fine-tuned and deployed on the platform.

Get an API Key

Create an account

Visit inference.net and create an account.

Create an API key

On the dashboard, go to the API Keys tab in the left sidebar. Create a new key or use the default key.

Set the environment variable

export INFERENCE_API_KEY=<your-api-key>

1. Call an Open-Source Model

Run open-source models hosted on Inference.net. No provider API key needed — just your Inference API key. Browse available models at inference.net/models.

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.inference.net/v1",
  apiKey: process.env.INFERENCE_API_KEY,
});

const response = await client.chat.completions.create({
  model: "google/gemma-3-27b-instruct/bf-16",
  messages: [{ role: "user", content: "What is the meaning of life?" }],
  stream: true,
});

for await (const chunk of response) {
  process.stdout.write(chunk.choices[0]?.delta?.content || "");
}

This works with any model on the platform, including our purpose-built Schematron models for structured data extraction.

2. Proxy Through Catalyst

Route requests to any LLM provider (OpenAI, Anthropic, Groq, etc.) through the Catalyst gateway. You keep your existing provider API key — the gateway adds observability, cost tracking, and eval-readiness with roughly 10ms of added latency. Your Inference project API key authenticates with the gateway. Your provider API key is forwarded to the provider via the x-inference-provider-api-key header.

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.inference.net/v1",
  apiKey: process.env.INFERENCE_API_KEY,
  defaultHeaders: {
    "x-inference-provider-api-key": process.env.OPENAI_API_KEY,
    "x-inference-provider": "openai",
  },
});

const response = await client.chat.completions.create({
  model: "gpt-4.1",
  messages: [{ role: "user", content: "What is the meaning of life?" }],
});

console.log(response.choices[0].message.content);

For detailed setup guides per provider (Anthropic, Groq, Cerebras, OpenRouter, and more), see the Integrations docs.

3. Call Your Custom Model

Hit a model you’ve fine-tuned and deployed on Inference.net. The model path is your team slug followed by the deployment name, shown on your deployment’s detail page in the dashboard.

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.inference.net/v1",
  apiKey: process.env.INFERENCE_API_KEY,
});

const response = await client.chat.completions.create({
  model: "your-team/your-model",
  messages: [{ role: "user", content: "Hello, world!" }],
});

console.log(response.choices[0].message.content);

Learn more about deploying models in the Deploy docs.

Headers Reference

Header	Required	Description
`Authorization`	Yes	`Bearer <your-api-key>` — authenticates the request. For OpenAI-compatible SDKs, set this as the SDK’s `apiKey`.
`Content-Type`	Yes	Must be `application/json`.
`x-inference-provider`	Proxy only	Routes the request to the correct provider: `openai`, `anthropic`, `groq`, `cerebras`, etc.
`x-inference-provider-api-key`	Proxy only	Your provider’s API key. The gateway forwards it downstream. For Anthropic’s native SDK, use `x-api-key` instead.
`x-inference-provider-url`	No	Routes to any OpenAI-compatible provider by base URL, even if it doesn’t have a dedicated integration.
`x-inference-environment`	No	Tags requests with an environment label, such as `production` or `staging`.
`x-inference-task-id`	No	Groups requests under a logical task for filtering and analytics in the dashboard.
`x-inference-metadata-*`	No	Attach arbitrary metadata to a request. The prefix is stripped to form the key — e.g., `x-inference-metadata-chat-id: abc123` stores `chat-id: abc123`. You can filter inferences and create datasets based on these keys in the dashboard.

Supported Request Parameters

The API supports the standard OpenAI chat completions parameters:

Parameter	Type	Description
`model`	`string`	The model to use.
`messages`	`array`	The conversation messages.
`stream`	`boolean`	Whether to stream the response.
`max_tokens`	`integer`	Maximum number of tokens to generate.
`temperature`	`number`	Sampling temperature (0–2).
`top_p`	`number`	Nucleus sampling threshold.
`frequency_penalty`	`number`	Penalizes repeated tokens based on frequency.
`presence_penalty`	`number`	Penalizes tokens based on whether they’ve appeared.
`response_format`	`object`	Set to `{"type": "json_object"}` or a JSON schema for structured outputs.
`tools`	`array`	Tool/function definitions for function calling.

Need a parameter that isn’t listed here? Contact us and we’ll add it.

Next Steps

Integrations

Set up Catalyst with OpenAI, Anthropic, Groq, and other providers.

Structured Outputs

Get typed JSON responses from your API calls.

Batch Processing

Process up to 50,000 requests in a single batch job.

Browse Models

Explore all models available on Inference.net.

​Get an API Key

​1. Call an Open-Source Model

​2. Proxy Through Catalyst

​3. Call Your Custom Model

​Headers Reference

​Supported Request Parameters

​Next Steps

Integrations

Structured Outputs

Batch Processing

Browse Models

Get an API Key

1. Call an Open-Source Model

2. Proxy Through Catalyst

3. Call Your Custom Model

Headers Reference

Supported Request Parameters

Next Steps