Skip to main content
The Batch API is the large-scale offline processing path in Inference.net.

Best fit

Use the Batch API when:
  • you already have a large queue of requests to run
  • immediate responses are not required
  • you want very high throughput without managing thousands of synchronous calls
  • file upload plus later retrieval is a good fit for the workflow

Batch API basics

  • base URL: https://batch.inference.net/v1
  • upload a JSONL input file first
  • create a batch from that file
  • poll the batch status or use a webhook
  • download the output file when processing completes

Source-backed JSONL example

The line shapes below come directly from inference/apps/relay/tests/e2e/utils/batch-test.utils.ts.
{"custom_id":"chat-1","method":"POST","body":{"model":"meta-llama/llama-3.2-3b-instruct/fp-16","n":null,"temperature":null,"top_p":null,"messages":[{"role":"user","content":"What is the capital of France?"}]}}
{"custom_id":"embed-1","method":"POST","body":{"model":"qwen/qwen3-embedding-4b","input":"What is the capital of France?","encoding_format":"float"}}

Source-backed TypeScript example

This is the same flow exercised in the batch e2e tests: upload a JSONL file, create a batch, then track the batch ID.
import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.INFERENCE_API_KEY,
  baseURL: "https://batch.inference.net/v1",
});

const inputFile = new File(
  [
    '{"custom_id":"chat-1","method":"POST","body":{"model":"meta-llama/llama-3.2-3b-instruct/fp-16","n":null,"temperature":null,"top_p":null,"messages":[{"role":"user","content":"What is the capital of France?"}]}}\n',
  ],
  "batch-input.jsonl",
  { type: "application/jsonl" },
);

const uploadedFile = await client.files.create({
  file: inputFile,
  purpose: "batch",
});

const batch = await client.batches.create({
  input_file_id: uploadedFile.id,
  endpoint: "/v1/completions",
  completion_window: "24h",
});

console.log(batch.id, batch.status);

Common use cases

  • structured extraction over a large corpus
  • mass translation
  • synthetic data generation
  • summarizing large collections of records
  • large-scale captioning or tagging jobs

Use batch instead of group jobs when

  • the workload is much larger than a small request bundle
  • file-based submission is acceptable
  • you want the clearest operational separation between request preparation and result retrieval