Use the direct API when
- you want a hosted model response right away
- you are prototyping a new workflow
- you do not need to proxy another provider through Observe
- you want to use the same OpenAI SDK shape across text, vision, and embeddings
Use Observe instead when
- you already call OpenAI, Anthropic, or another provider in production
- you want tracing, analytics, datasets, and eval inputs from real traffic
- you want to keep upstream provider choice separate from your analytics layer
API surfaces
API Quickstart
Synchronous requests and streaming responses for interactive workloads.
Embeddings
Text embeddings through an OpenAI-compatible interface.
Vision
Vision-language requests for image understanding and multimodal workflows.
Structured outputs
Force responses into predictable JSON schemas.
Background jobs
Return later when the request finishes, with polling or webhooks.
Batch API
Upload large offline workloads and process them asynchronously.
Auth model
The direct API uses your standard Inference.net API key in theAuthorization: Bearer ... header.
If you are not sure which key type you need, go to /reference/auth-and-key-types.
Next steps
- Start with /quickstart if you want a working request right now.
- Use /guides/choose-realtime-background-group-or-batch to choose between realtime, background, group, and batch paths.