Skip to main content
Inference.net exposes an OpenAI-compatible API for real-time chat completions, embeddings, structured outputs, batch workloads, and lower-cost asynchronous inference.
This page is the fastest way to orient yourself. Use the Quickstart for your first request, then jump into the feature-specific guides for endpoint details and examples.

What You Need

Base URL and Authentication

ItemValue
Base URLhttps://api.inference.net/v1
Auth headerAuthorization: Bearer $INFERENCE_API_KEY
SDK compatibilityOpenAI SDK
Fastest getting-started path/quickstart
Use a dashboard API key for direct model calls. If you want to trace requests that are headed to OpenAI, Google Gemini, Together AI, or other providers, use the dashboard integration guide instead of the direct API path.

Common API Surfaces

Use caseEndpoint or surfaceDocumentation
Real-time text generation/chat/completions/quickstart
Embeddings/embeddings/features/embeddings
Structured outputsOpenAI-compatible JSON response formatting/features/structured-outputs
Batch inferenceBatch API/features/batch-api
Lower-cost async jobsAsynchronous Inference API/features/asynchronous-inference/overview
Schema-guided extractionSchematron models/workhorse-models/schematron

Pick the Right Starting Point

  • Need your first key and a request you can paste into a terminal? Go to /quickstart.
  • Want to collect traces across providers before you optimize or fine-tune? Use the dashboard integration guide.
  • Ready to move from captured traffic to datasets, evals, and training jobs? Continue to /fine-tuning/e2e-guide.