Skip to main content
Inference.net exposes an OpenAI-compatible API for direct model access. This is the fastest path when you want to call a hosted model immediately with an Inference.net API key.

Use the direct API when

  • you want a hosted model response right away
  • you are prototyping a new workflow
  • you do not need to proxy another provider through Observe
  • you want to use the same OpenAI SDK shape across text, vision, and embeddings

Use Observe instead when

  • you already call OpenAI, Anthropic, or another provider in production
  • you want tracing, analytics, datasets, and eval inputs from real traffic
  • you want to keep upstream provider choice separate from your analytics layer
Start with /start-here/observe-quickstart if that is your use case.

API surfaces

API Quickstart

Synchronous requests and streaming responses for interactive workloads.

Embeddings

Text embeddings through an OpenAI-compatible interface.

Vision

Vision-language requests for image understanding and multimodal workflows.

Structured outputs

Force responses into predictable JSON schemas.

Background jobs

Return later when the request finishes, with polling or webhooks.

Batch API

Upload large offline workloads and process them asynchronously.

Auth model

The direct API uses your standard Inference.net API key in the Authorization: Bearer ... header. If you are not sure which key type you need, go to /reference/auth-and-key-types.

Next steps