Use this guide when
- you already know the task you want to run
- you are deciding whether a user is waiting on the answer
- you are deciding between synchronous and offline processing
Realtime inference
Choose realtime when:- a person or blocking application flow is waiting
- you want the simplest request path
- you need streaming UX
Background jobs
Choose background jobs when:- the request can finish later
- you want to poll by generation ID or use a webhook
- a single request may take longer than your app should wait
Group jobs
Choose group jobs when:- you have a small related bundle of requests
- you want one group ID and one completion event
- building a JSONL batch file would be unnecessary overhead
Batch
Choose batch when:- the workload is large and offline
- file-based submission is acceptable
- you care more about throughput than immediate completion
Serverless vs deployment
This is a separate decision from execution mode.- choose serverless when you want the fastest hosted path
- choose deployment when the workload deserves dedicated capacity or a trained model serving path
Rule of thumb
- if a human is waiting, start with realtime
- if a system is waiting, start with background jobs
- if you have a small related bundle, consider group jobs
- if you already have a large file of work, use batch