Skip to main content
This page is the fastest path to a working request against the direct Inference.net API.
If you already use OpenAI, Anthropic, or another provider and want observability on top of that traffic, start with /start-here/observe-quickstart instead.

1. Get an API key

  1. Visit inference.net and create an account.
  2. On the dashboard, visit the API Keys tab on the left sidebar. Create an API key or use the default key.
  3. Copy the API key to your clipboard by clicking the copy icon to the right of the key.
  4. In your terminal, set the INFERENCE_API_KEY environment variable to the API key you copied.
export INFERENCE_API_KEY=<your-api-key>

2. Send a test request

Use a simple curl request to confirm your key works and the API is reachable.
curl -N https://api.inference.net/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $INFERENCE_API_KEY" \
  -d '{
    "model": "meta-llama/llama-3.1-8b-instruct/fp-8",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "What is the meaning of life?"
      }
    ],
    "stream": true
  }'
The request output should stream into your terminal.

3. Use the OpenAI SDK

Inference.net is compatible with the OpenAI Chat API. You can use the official OpenAI SDK to interact with the Inference.net API. We support both streaming and non-streaming requests, as well as common parameters such as:
  • max_tokens
  • temperature
  • top_p
  • frequency_penalty
  • presence_penalty
If you need a parameter that is not listed here, email support or meet with our team.
Make sure you export the INFERENCE_API_KEY environment variable before running the examples below.
import os
from openai import OpenAI

client = OpenAI(
    base_url="https://api.inference.net/v1",
    api_key=os.environ.get("INFERENCE_API_KEY"),
)

response = client.chat.completions.create(
    model="meta-llama/llama-3.1-8b-instruct/fp-8",
    messages=[{"role": "user", "content": "What is the meaning of life?"}],
    stream=True,
)

for chunk in response:
    if chunk.choices[0].delta.content is not None:
        print(chunk.choices[0].delta.content, end='', flush=True)

Next Steps

API Overview

Learn how the direct API fits into the broader platform and when to use background or batch paths.

Observe an existing app

Route traffic from another provider through Inference.net for tracing, analytics, datasets, and eval inputs.

Batch Processing

Process large asynchronous workloads offline when a user is not waiting for the answer.

View Models

Explore the models available on Inference.net