Get an API key

  1. Visit inference.net and create an account.

  2. On the dashboard, visit the API Keys tab on the left sidebar. Create an API key or use the default key.

  3. Copy the API key to your clipboard by clicking the copy icon to the right of the key.

  4. In your terminal, set the INFERENCE_API_KEY environment variable to the API key you copied.

export INFERENCE_API_KEY=<your-api-key>

Test Request

Perform a simple curl request to the Inference.net API.

curl -N https://api.inference.net/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $INFERENCE_API_KEY" \
  -d '{
    "model": "meta-llama/llama-3.1-8b-instruct/fp-8",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "What is the meaning of life?"
      }
    ],
    "stream": true
  }'

The request output should stream into your terminal.

OpenAI SDK

Inference.net is compatible with the OpenAI Chat API. You can use the official OpenAI SDK to interact with the Inference.net API.

We support both streaming and non-streaming requests, as well as the following parameters:

  • max_tokens
  • temperature
  • top_p
  • frequency_penalty
  • presence_penalty

If you need parameters that are not list here, please contact us and we’ll add them.

Note: Make sure you export the INFERENCE_API_KEY environment variable before running the code below.

import os
from openai import OpenAI

client = OpenAI(
    base_url="https://api.inference.net/v1",
    api_key=os.environ.get("INFERENCE_API_KEY"),
)

response = client.chat.completions.create(
    model="meta-llama/llama-3.1-8b-instruct/fp-8",
    messages=[{"role": "user", "content": "What is the meaning of life?"}],
    stream=True,
)

for chunk in response:
    if chunk.choices[0].delta.content is not None:
        print(chunk.choices[0].delta.content, end='', flush=True)

Next Steps