Batch API

The Batch API is the large-scale offline processing path in Inference.net.

Best fit

Use the Batch API when:

you already have a large queue of requests to run
immediate responses are not required
you want very high throughput without managing thousands of synchronous calls
file upload plus later retrieval is a good fit for the workflow

Batch API basics

base URL: https://batch.inference.net/v1
upload a JSONL input file first
create a batch from that file
poll the batch status or use a webhook
download the output file when processing completes

Source-backed JSONL example

The line shapes below come directly from inference/apps/relay/tests/e2e/utils/batch-test.utils.ts.

{"custom_id":"chat-1","method":"POST","body":{"model":"meta-llama/llama-3.2-3b-instruct/fp-16","n":null,"temperature":null,"top_p":null,"messages":[{"role":"user","content":"What is the capital of France?"}]}}
{"custom_id":"embed-1","method":"POST","body":{"model":"qwen/qwen3-embedding-4b","input":"What is the capital of France?","encoding_format":"float"}}

Source-backed TypeScript example

This is the same flow exercised in the batch e2e tests: upload a JSONL file, create a batch, then track the batch ID.

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.INFERENCE_API_KEY,
  baseURL: "https://batch.inference.net/v1",
});

const inputFile = new File(
  [
    '{"custom_id":"chat-1","method":"POST","body":{"model":"meta-llama/llama-3.2-3b-instruct/fp-16","n":null,"temperature":null,"top_p":null,"messages":[{"role":"user","content":"What is the capital of France?"}]}}\n',
  ],
  "batch-input.jsonl",
  { type: "application/jsonl" },
);

const uploadedFile = await client.files.create({
  file: inputFile,
  purpose: "batch",
});

const batch = await client.batches.create({
  input_file_id: uploadedFile.id,
  endpoint: "/v1/completions",
  completion_window: "24h",
});

console.log(batch.id, batch.status);

Common use cases

structured extraction over a large corpus
mass translation
synthetic data generation
summarizing large collections of records
large-scale captioning or tagging jobs

Use batch instead of group jobs when

the workload is much larger than a small request bundle
file-based submission is acceptable
you want the clearest operational separation between request preparation and result retrieval

Start Here

Guides

Reference

Tutorials

Best fit

Batch API basics

Source-backed JSONL example

Source-backed TypeScript example

Common use cases

Use batch instead of group jobs when

Start Here

Guides

Reference

Tutorials

​Best fit

​Batch API basics

​Source-backed JSONL example

​Source-backed TypeScript example

​Common use cases

​Use batch instead of group jobs when

​Related pages

Best fit

Batch API basics

Source-backed JSONL example

Source-backed TypeScript example

Common use cases

Use batch instead of group jobs when

Related pages