> ## Documentation Index
> Fetch the complete documentation index at: https://docs.inference.net/llms.txt
> Use this file to discover all available pages before exploring further.

# API Quickstart

> Get started with the Inference.net API

The Inference.net API is OpenAI-compatible, so you can use the OpenAI SDK or plain HTTP to make requests. There are three ways to use it:

1. **Call an open-source model** — run models hosted on Inference.net directly.
2. **Proxy through Catalyst** — route requests to any provider (OpenAI, Anthropic, etc.) through the Catalyst gateway for observability, evals, and cost tracking.
3. **Call your custom model** — hit a model you've fine-tuned and deployed on the platform.

## Get an API Key

<Steps>
  <Step title="Create an account">
    Visit [inference.net](https://inference.net) and create an account.
  </Step>

  <Step title="Create an API key">
    On the dashboard, go to the **API Keys** tab in the left sidebar. Create a new key or use the default key.
  </Step>

  <Step title="Set the environment variable">
    ```bash theme={"system"}
    export INFERENCE_API_KEY=<your-api-key>
    ```
  </Step>
</Steps>

***

## 1. Call an Open-Source Model

Run open-source models hosted on Inference.net. No provider API key needed — just your Inference API key. Browse available models at [inference.net/models](https://inference.net/models).

<CodeGroup>
  ```typescript TypeScript theme={"system"}
  import OpenAI from "openai";

  const client = new OpenAI({
    baseURL: "https://api.inference.net/v1",
    apiKey: process.env.INFERENCE_API_KEY,
  });

  const response = await client.chat.completions.create({
    model: "google/gemma-3-27b-instruct/bf-16",
    messages: [{ role: "user", content: "What is the meaning of life?" }],
    stream: true,
  });

  for await (const chunk of response) {
    process.stdout.write(chunk.choices[0]?.delta?.content || "");
  }
  ```

  ```python Python theme={"system"}
  import os
  from openai import OpenAI

  client = OpenAI(
      base_url="https://api.inference.net/v1",
      api_key=os.environ["INFERENCE_API_KEY"],
  )

  response = client.chat.completions.create(
      model="google/gemma-3-27b-instruct/bf-16",
      messages=[{"role": "user", "content": "What is the meaning of life?"}],
      stream=True,
  )

  for chunk in response:
      if chunk.choices[0].delta.content is not None:
          print(chunk.choices[0].delta.content, end="", flush=True)
  ```

  ```bash cURL theme={"system"}
  curl -N https://api.inference.net/v1/chat/completions \
    -H "Authorization: Bearer $INFERENCE_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "model": "google/gemma-3-27b-instruct/bf-16",
      "messages": [
        {"role": "user", "content": "What is the meaning of life?"}
      ],
      "stream": true
    }'
  ```
</CodeGroup>

This works with any model on the platform, including our purpose-built [Schematron](/workhorse-models/schematron) models for structured data extraction.

***

## 2. Proxy Through Catalyst

Route requests to any LLM provider (OpenAI, Anthropic, Groq, etc.) through the Catalyst gateway. You keep your existing provider API key — the gateway adds observability, cost tracking, and eval-readiness with roughly 10ms of added latency.

Your Inference project API key authenticates with the gateway. Your provider API key is forwarded to the provider via the `x-inference-provider-api-key` header.

<CodeGroup>
  ```typescript TypeScript theme={"system"}
  import OpenAI from "openai";

  const client = new OpenAI({
    baseURL: "https://api.inference.net/v1",
    apiKey: process.env.INFERENCE_API_KEY,
    defaultHeaders: {
      "x-inference-provider-api-key": process.env.OPENAI_API_KEY,
      "x-inference-provider": "openai",
    },
  });

  const response = await client.chat.completions.create({
    model: "gpt-4.1",
    messages: [{ role: "user", content: "What is the meaning of life?" }],
  });

  console.log(response.choices[0].message.content);
  ```

  ```python Python theme={"system"}
  import os
  from openai import OpenAI

  client = OpenAI(
      base_url="https://api.inference.net/v1",
      api_key=os.environ["INFERENCE_API_KEY"],
      default_headers={
          "x-inference-provider-api-key": os.environ["OPENAI_API_KEY"],
          "x-inference-provider": "openai",
      },
  )

  response = client.chat.completions.create(
      model="gpt-4.1",
      messages=[{"role": "user", "content": "What is the meaning of life?"}],
  )

  print(response.choices[0].message.content)
  ```

  ```bash cURL theme={"system"}
  curl https://api.inference.net/v1/chat/completions \
    -H "Authorization: Bearer $INFERENCE_API_KEY" \
    -H "x-inference-provider-api-key: $OPENAI_API_KEY" \
    -H "x-inference-provider: openai" \
    -H "Content-Type: application/json" \
    -d '{
      "model": "gpt-4.1",
      "messages": [
        {"role": "user", "content": "What is the meaning of life?"}
      ]
    }'
  ```
</CodeGroup>

<Info>
  For detailed setup guides per provider (Anthropic, Groq, Cerebras, OpenRouter, and more), see the [Integrations](/integrations/overview) docs.
</Info>

***

## 3. Call Your Custom Model

Hit a model you've fine-tuned and deployed on Inference.net. The model path is your team slug followed by the deployment name, shown on your deployment's detail page in the dashboard.

<CodeGroup>
  ```typescript TypeScript theme={"system"}
  import OpenAI from "openai";

  const client = new OpenAI({
    baseURL: "https://api.inference.net/v1",
    apiKey: process.env.INFERENCE_API_KEY,
  });

  const response = await client.chat.completions.create({
    model: "your-team/your-model",
    messages: [{ role: "user", content: "Hello, world!" }],
  });

  console.log(response.choices[0].message.content);
  ```

  ```python Python theme={"system"}
  import os
  from openai import OpenAI

  client = OpenAI(
      base_url="https://api.inference.net/v1",
      api_key=os.environ["INFERENCE_API_KEY"],
  )

  response = client.chat.completions.create(
      model="your-team/your-model",
      messages=[{"role": "user", "content": "Hello, world!"}],
  )

  print(response.choices[0].message.content)
  ```

  ```bash cURL theme={"system"}
  curl https://api.inference.net/v1/chat/completions \
    -H "Authorization: Bearer $INFERENCE_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "model": "your-team/your-model",
      "messages": [
        {"role": "user", "content": "Hello, world!"}
      ]
    }'
  ```
</CodeGroup>

<Info>
  Learn more about deploying models in the [Deploy](/platform/deploy/overview) docs.
</Info>

***

## Headers Reference

| Header                         | Required   | Description                                                                                                                                                                                                                                 |
| ------------------------------ | ---------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `Authorization`                | Yes        | `Bearer <your-api-key>` — authenticates the request. For OpenAI-compatible SDKs, set this as the SDK's `apiKey`.                                                                                                                            |
| `Content-Type`                 | Yes        | Must be `application/json`.                                                                                                                                                                                                                 |
| `x-inference-provider`         | Proxy only | Routes the request to the correct provider: `openai`, `anthropic`, `groq`, `cerebras`, etc.                                                                                                                                                 |
| `x-inference-provider-api-key` | Proxy only | Your provider's API key. The gateway forwards it downstream. For Anthropic's native SDK, use `x-api-key` instead.                                                                                                                           |
| `x-inference-provider-url`     | No         | Routes to any OpenAI-compatible provider by base URL, even if it doesn't have a dedicated integration.                                                                                                                                      |
| `x-inference-environment`      | No         | Tags requests with an environment label, such as `production` or `staging`.                                                                                                                                                                 |
| `x-inference-task-id`          | No         | Groups requests under a logical task for filtering and analytics in the dashboard.                                                                                                                                                          |
| `x-inference-metadata-*`       | No         | Attach arbitrary metadata to a request. The prefix is stripped to form the key — e.g., `x-inference-metadata-chat-id: abc123` stores `chat-id: abc123`. You can filter inferences and create datasets based on these keys in the dashboard. |

## Supported Request Parameters

The API supports the standard OpenAI chat completions parameters:

| Parameter           | Type      | Description                                                                                          |
| ------------------- | --------- | ---------------------------------------------------------------------------------------------------- |
| `model`             | `string`  | The model to use.                                                                                    |
| `messages`          | `array`   | The conversation messages.                                                                           |
| `stream`            | `boolean` | Whether to stream the response.                                                                      |
| `max_tokens`        | `integer` | Maximum number of tokens to generate.                                                                |
| `temperature`       | `number`  | Sampling temperature (0–2).                                                                          |
| `top_p`             | `number`  | Nucleus sampling threshold.                                                                          |
| `frequency_penalty` | `number`  | Penalizes repeated tokens based on frequency.                                                        |
| `presence_penalty`  | `number`  | Penalizes tokens based on whether they've appeared.                                                  |
| `response_format`   | `object`  | Set to `{"type": "json_object"}` or a JSON schema for [structured outputs](/api/structured-outputs). |
| `tools`             | `array`   | Tool/function definitions for [function calling](/api/function-calling).                             |

<Info>
  Need a parameter that isn't listed here? [Contact us](mailto:support@inference.net) and we'll add it.
</Info>

## Next Steps

<CardGroup cols={2}>
  <Card title="Integrations" icon="puzzle-piece" href="/integrations/overview">
    Set up Catalyst with OpenAI, Anthropic, Groq, and other providers.
  </Card>

  <Card title="Structured Outputs" icon="brackets-curly" href="/api/structured-outputs">
    Get typed JSON responses from your API calls.
  </Card>

  <Card title="Batch Processing" icon="list-check" href="/api/async-inference/batch-api">
    Process up to 50,000 requests in a single batch job.
  </Card>

  <Card title="Browse Models" icon="microchip" href="https://inference.net/models">
    Explore all models available on Inference.net.
  </Card>
</CardGroup>
