This page is the fastest way to orient yourself. Use the Quickstart for your first request, then jump into the feature-specific guides for endpoint details and examples.
What You Need
Get an API key
Create an account, open the dashboard, and copy your API key.
Make your first request
Start with the copy-paste curl request and SDK examples in Quickstart.
Base URL and Authentication
| Item | Value |
|---|---|
| Base URL | https://api.inference.net/v1 |
| Auth header | Authorization: Bearer $INFERENCE_API_KEY |
| SDK compatibility | OpenAI SDK |
| Fastest getting-started path | /quickstart |
Common API Surfaces
| Use case | Endpoint or surface | Documentation |
|---|---|---|
| Real-time text generation | /chat/completions | /quickstart |
| Embeddings | /embeddings | /features/embeddings |
| Structured outputs | OpenAI-compatible JSON response formatting | /features/structured-outputs |
| Batch inference | Batch API | /features/batch-api |
| Lower-cost async jobs | Asynchronous Inference API | /features/asynchronous-inference/overview |
| Schema-guided extraction | Schematron models | /workhorse-models/schematron |
Pick the Right Starting Point
Quickstart
Make your first API call with curl, Python, or JavaScript.
Schematron
Use schema-guided extraction for structured JSON from messy HTML.
Embeddings
Generate vectors for search, retrieval, clustering, and ranking.
Structured Outputs
Keep responses aligned to a JSON schema.
Batch API
Queue large workloads that do not need an immediate response.
Asynchronous Inference
Trade latency for lower-cost processing on long-running jobs.
Related Docs
- Need your first key and a request you can paste into a terminal? Go to /quickstart.
- Want to collect traces across providers before you optimize or fine-tune? Use the dashboard integration guide.
- Ready to move from captured traffic to datasets, evals, and training jobs? Continue to /fine-tuning/e2e-guide.