Skip to main content
The Inference.net API is OpenAI-compatible, so you can use the OpenAI SDK or plain HTTP to make requests. There are three ways to use it:
  1. Call an open-source model — run models hosted on Inference.net directly.
  2. Proxy through Catalyst — route requests to any provider (OpenAI, Anthropic, etc.) through the Catalyst gateway for observability, evals, and cost tracking.
  3. Call your custom model — hit a model you’ve fine-tuned and deployed on the platform.

Get an API Key

1

Create an account

Visit inference.net and create an account.
2

Create an API key

On the dashboard, go to the API Keys tab in the left sidebar. Create a new key or use the default key.
3

Set the environment variable

export INFERENCE_API_KEY=<your-api-key>

1. Call an Open-Source Model

Run open-source models hosted on Inference.net. No provider API key needed — just your Inference API key. Browse available models at inference.net/models.
import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.inference.net/v1",
  apiKey: process.env.INFERENCE_API_KEY,
});

const response = await client.chat.completions.create({
  model: "google/gemma-3-27b-instruct/bf-16",
  messages: [{ role: "user", content: "What is the meaning of life?" }],
  stream: true,
});

for await (const chunk of response) {
  process.stdout.write(chunk.choices[0]?.delta?.content || "");
}
This works with any model on the platform, including our purpose-built Schematron models for structured data extraction.

2. Proxy Through Catalyst

Route requests to any LLM provider (OpenAI, Anthropic, Groq, etc.) through the Catalyst gateway. You keep your existing provider API key — the gateway adds observability, cost tracking, and eval-readiness with roughly 10ms of added latency. Your Inference project API key authenticates with the gateway. Your provider API key is forwarded to the provider via the x-inference-provider-api-key header.
import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.inference.net/v1",
  apiKey: process.env.INFERENCE_API_KEY,
  defaultHeaders: {
    "x-inference-provider-api-key": process.env.OPENAI_API_KEY,
    "x-inference-provider": "openai",
  },
});

const response = await client.chat.completions.create({
  model: "gpt-4.1",
  messages: [{ role: "user", content: "What is the meaning of life?" }],
});

console.log(response.choices[0].message.content);
For detailed setup guides per provider (Anthropic, Groq, Cerebras, OpenRouter, and more), see the Integrations docs.

3. Call Your Custom Model

Hit a model you’ve fine-tuned and deployed on Inference.net. The model path is your team slug followed by the deployment name, shown on your deployment’s detail page in the dashboard.
import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.inference.net/v1",
  apiKey: process.env.INFERENCE_API_KEY,
});

const response = await client.chat.completions.create({
  model: "your-team/your-model",
  messages: [{ role: "user", content: "Hello, world!" }],
});

console.log(response.choices[0].message.content);
Learn more about deploying models in the Deploy docs.

Headers Reference

HeaderRequiredDescription
AuthorizationYesBearer <your-api-key> — authenticates the request. For OpenAI-compatible SDKs, set this as the SDK’s apiKey.
Content-TypeYesMust be application/json.
x-inference-providerProxy onlyRoutes the request to the correct provider: openai, anthropic, groq, cerebras, etc.
x-inference-provider-api-keyProxy onlyYour provider’s API key. The gateway forwards it downstream. For Anthropic’s native SDK, use x-api-key instead.
x-inference-provider-urlNoRoutes to any OpenAI-compatible provider by base URL, even if it doesn’t have a dedicated integration.
x-inference-environmentNoTags requests with an environment label, such as production or staging.
x-inference-task-idNoGroups requests under a logical task for filtering and analytics in the dashboard.
x-inference-metadata-*NoAttach arbitrary metadata to a request. The prefix is stripped to form the key — e.g., x-inference-metadata-chat-id: abc123 stores chat-id: abc123. You can filter inferences and create datasets based on these keys in the dashboard.

Supported Request Parameters

The API supports the standard OpenAI chat completions parameters:
ParameterTypeDescription
modelstringThe model to use.
messagesarrayThe conversation messages.
streambooleanWhether to stream the response.
max_tokensintegerMaximum number of tokens to generate.
temperaturenumberSampling temperature (0–2).
top_pnumberNucleus sampling threshold.
frequency_penaltynumberPenalizes repeated tokens based on frequency.
presence_penaltynumberPenalizes tokens based on whether they’ve appeared.
response_formatobjectSet to {"type": "json_object"} or a JSON schema for structured outputs.
toolsarrayTool/function definitions for function calling.
Need a parameter that isn’t listed here? Contact us and we’ll add it.

Next Steps

Integrations

Set up Catalyst with OpenAI, Anthropic, Groq, and other providers.

Structured Outputs

Get typed JSON responses from your API calls.

Batch Processing

Process up to 50,000 requests in a single batch job.

Browse Models

Explore all models available on Inference.net.