Best fit
Use the Batch API when:- you already have a large queue of requests to run
- immediate responses are not required
- you want very high throughput without managing thousands of synchronous calls
- file upload plus later retrieval is a good fit for the workflow
Batch API basics
- base URL:
https://batch.inference.net/v1 - upload a JSONL input file first
- create a batch from that file
- poll the batch status or use a webhook
- download the output file when processing completes
Source-backed JSONL example
The line shapes below come directly frominference/apps/relay/tests/e2e/utils/batch-test.utils.ts.
Source-backed TypeScript example
This is the same flow exercised in the batch e2e tests: upload a JSONL file, create a batch, then track the batch ID.Common use cases
- structured extraction over a large corpus
- mass translation
- synthetic data generation
- summarizing large collections of records
- large-scale captioning or tagging jobs
Use batch instead of group jobs when
- the workload is much larger than a small request bundle
- file-based submission is acceptable
- you want the clearest operational separation between request preparation and result retrieval