Background Jobs

Background jobs are the right path when you want the normal inference request shape, but you do not need the response immediately.

How they work

send the request to the asynchronous path instead of the normal realtime path
get a generation identifier back immediately
poll for the result later, or use a webhook

Source-backed curl example

The request shape below matches the async test helpers in inference/apps/relay/tests/e2e/utils/inference-api.ts and the route mounted at /v1/slow/chat/completions.

curl https://api.inference.net/v1/slow/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $INFERENCE_API_KEY" \
  -d '{
    "model": "meta-llama/llama-3.2-3b-instruct/fp-16",
    "messages": [
      {"role": "system", "content": "You do exactly what the user asks."},
      {"role": "user", "content": "Respond with a single word of your choice and nothing else."}
    ],
    "max_tokens": 10,
    "stream": false
  }'

The async response includes an id. That is the generation identifier you use for polling.

curl https://api.inference.net/v1/generation/$GENERATION_ID \
  -H "Authorization: Bearer $INFERENCE_API_KEY"

Once the generation is complete, the response payload matches the same GenerationForClient shape that completion webhooks deliver in their data field.

Canonical path

Inference.net supports the slow background path. In some parts of the product and codebase you may also see async used as an alias.

Best fit

Use background jobs for:

cost-sensitive inference
non-interactive generation
longer-running requests
workflows that can finish later and notify your system with a webhook

Result retrieval

After submission, retrieve the completed result using the generation identifier. Background results preserve the original request and the final response, which makes them useful for later inspection and dataset curation.

Webhooks

For async flows, webhooks are usually better than tight polling loops. Relevant events include:

generation.completed
async-embedding.completed

To receive a webhook instead of polling, include a webhook_id in the request metadata:

{
  "model": "meta-llama/llama-3.2-3b-instruct/fp-16",
  "messages": [{"role": "user", "content": "Hello, webhook test!"}],
  "stream": false,
  "n": 1,
  "temperature": 1,
  "top_p": 1,
  "metadata": {
    "webhook_id": "wh_123"
  }
}

See /reference/webhooks.

When to use another mode instead

use /guides/choose-realtime-background-group-or-batch when you need help choosing between background jobs, group jobs, and batch
use /api/batch for large offline file-driven workloads
use /quickstart when the caller is waiting on the answer

Start Here

Guides

Reference

Tutorials

How they work

Source-backed curl example

Canonical path

Best fit

Result retrieval

Webhooks

When to use another mode instead

Start Here

Guides

Reference

Tutorials

​How they work

​Source-backed curl example

​Canonical path

​Best fit

​Result retrieval

​Webhooks

​When to use another mode instead

How they work

Source-backed curl example

Canonical path

Best fit

Result retrieval

Webhooks

When to use another mode instead