Get an API key
-
Visit inference.net and create an account.
-
On the dashboard, visit the API Keys
tab on the left sidebar. Create an API key or use the default key.
-
Copy the API key to your clipboard by clicking the copy icon to the right of the key.
-
In your terminal, set the INFERENCE_API_KEY
environment variable to the API key you copied.
export INFERENCE_API_KEY=<your-api-key>
Test Request
Perform a simple curl request to the Inference Cloud API.
curl -N https://api.inference.net/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $INFERENCE_API_KEY" \
-d '{
"model": "meta-llama/llama-3.1-8b-instruct/fp-8",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "What is the meaning of life?"
}
],
"stream": true
}'
The request output should stream into your terminal.
OpenAI SDK
Inference Cloud is compatible with the OpenAI Chat API. You can use the official OpenAI SDK to interact with the Inference Cloud API.
We support both streaming and non-streaming requests, as well as the following parameters:
max_tokens
temperature
top_p
frequency_penalty
presence_penalty
If you need parameters that are not list here, please contact us and we’ll add them.
Note: Make sure you export the INFERENCE_API_KEY
environment variable before running the code below.
Python
import os
from openai import OpenAI
client = OpenAI(
base_url="https://api.inference.net/v1",
api_key=os.environ.get("INFERENCE_API_KEY"),
)
response = client.chat.completions.create(
model="meta-llama/llama-3.1-8b-instruct/fp-8",
messages=[{"role": "user", "content": "What is the meaning of life?"}],
stream=True,
)
for chunk in response:
if chunk.choices[0].delta.content is not None:
print(chunk.choices[0].delta.content, end='', flush=True)
Bun / Node.js
import OpenAI from "openai";
const openai = new OpenAI({
baseURL: "https://api.inference.net/v1",
apiKey: process.env.INFERENCE_API_KEY,
});
const completion = await openai.chat.completions.create({
model: "meta-llama/llama-3.1-8b-instruct/fp-8",
messages: [{ role: "user", content: "What is the meaning of life?" }],
stream: true,
});
for await (const chunk of completion) {
process.stdout.write(chunk.choices[0]?.delta.content || '');
}
Next Steps