Skip to main content
Execution mode is not just a technical detail. It determines latency, user experience, webhook strategy, and how much operational complexity your app takes on.

Use this guide when

  • you already know the task you want to run
  • you are deciding whether a user is waiting on the answer
  • you are deciding between synchronous and offline processing

Realtime inference

Choose realtime when:
  • a person or blocking application flow is waiting
  • you want the simplest request path
  • you need streaming UX
Best starting point: /quickstart

Background jobs

Choose background jobs when:
  • the request can finish later
  • you want to poll by generation ID or use a webhook
  • a single request may take longer than your app should wait
Best starting point: /api/background-jobs

Group jobs

Choose group jobs when:
  • you have a small related bundle of requests
  • you want one group ID and one completion event
  • building a JSONL batch file would be unnecessary overhead
In most cases, group jobs are just a more structured async path. If you are not sure, start with background jobs and move up only when the grouping really matters.

Batch

Choose batch when:
  • the workload is large and offline
  • file-based submission is acceptable
  • you care more about throughput than immediate completion
Best starting point: /api/batch

Serverless vs deployment

This is a separate decision from execution mode.
  • choose serverless when you want the fastest hosted path
  • choose deployment when the workload deserves dedicated capacity or a trained model serving path
Best starting point: /guides/promote-a-trained-model-to-deployment

Rule of thumb

  • if a human is waiting, start with realtime
  • if a system is waiting, start with background jobs
  • if you have a small related bundle, consider group jobs
  • if you already have a large file of work, use batch