How they work
- send the request to the asynchronous path instead of the normal realtime path
- get a generation identifier back immediately
- poll for the result later, or use a webhook
Source-backed curl example
The request shape below matches the async test helpers ininference/apps/relay/tests/e2e/utils/inference-api.ts and the route mounted at /v1/slow/chat/completions.
id. That is the generation identifier you use for polling.
GenerationForClient shape that completion webhooks deliver in their data field.
Canonical path
Inference.net supports theslow background path. In some parts of the product and codebase you may also see async used as an alias.
Best fit
Use background jobs for:- cost-sensitive inference
- non-interactive generation
- longer-running requests
- workflows that can finish later and notify your system with a webhook
Result retrieval
After submission, retrieve the completed result using the generation identifier. Background results preserve the original request and the final response, which makes them useful for later inspection and dataset curation.Webhooks
For async flows, webhooks are usually better than tight polling loops. Relevant events include:generation.completedasync-embedding.completed
webhook_id in the request metadata:
When to use another mode instead
- use /guides/choose-realtime-background-group-or-batch when you need help choosing between background jobs, group jobs, and batch
- use /api/batch for large offline file-driven workloads
- use /quickstart when the caller is waiting on the answer