> ## Documentation Index > Fetch the complete documentation index at: https://docs.inference.net/llms.txt > Use this file to discover all available pages before exploring further. # Batch API > Process jobs asynchronously with Batch API. Learn how to use our OpenAI-compatible Batch API to send asynchronous groups of inference requests to Inference.net, with nearly unlimited rate limits and fast completion times. The service is ideal for processing a large number of jobs that don't require immediate responses. Batch API is currently compatible with all the [models](https://inference.net/models) we offer. Use [https://batch.inference.net/v1](https://batch.inference.net/v1) for all Batch API requests (including Files and Batches). Do not use [https://api.inference.net/v1](https://api.inference.net/v1) for batch jobs. ## Overview While some uses require you to send synchronous requests, there are many cases where requests do not need an immediate response or rate limits prevent you from executing a large number of queries quickly. Batch processing jobs are often helpful in use cases like: 1. Extracting structured data from a large number of documents. 2. Generating synthetic data for training. 3. Translating a large number of documents into other languages. 4. Summarizing a large number of customer interactions. Inference.net's Batch API offers a straightforward set of endpoints that allow you to upload a batch of requests, kick off a batch processing job, query for the status of the batch, and eventually retrieve the collected results when the batch is complete. Compared to using standard endpoints directly, Batch API has: 1. **Higher rate limits:** Substantially more headroom compared to the [synchronous APIs](/api/rate-limits). 2. **Fast completion times:** Each batch completes within 24 hours (and often much more quickly). ## Getting Started You'll need an Inference.net account and API key to use the Batch API. See our [Quick Start Guide](/api/api-quickstart) for instructions on how to create an account and get an API key. Install the [OpenAI SDK](https://platform.openai.com/docs/libraries) for your language of choice. To connect to Inference.net's Batch API using the OpenAI SDK, set the base URL to `https://batch.inference.net/v1`. In this example, we are reading the API key from the environment variable `INFERENCE_API_KEY`. ```typescript TypeScript theme={"system"} import OpenAI from "openai"; const client = new OpenAI({ baseURL: "https://batch.inference.net/v1", apiKey: process.env.INFERENCE_API_KEY, }); ``` ```python Python theme={"system"} import os from openai import OpenAI client = OpenAI( base_url="https://batch.inference.net/v1", api_key=os.environ["INFERENCE_API_KEY"], ) ``` ```bash cURL theme={"system"} export INFERENCE_API_KEY= # All Batch API requests use https://batch.inference.net/v1 ``` ## Running A Batch Processing Job ### 1. Preparing Your Batch File Prepare a `.jsonl` file where each line is a separate JSON object that represents an individual request. Each JSON object must be on a single line and cannot contain any line breaks. Each JSON object must include the following fields: * `custom_id`: A unique identifier for the request. This is used to reference the request's results after completion. It must be unique for each request in the file. * `method`: The HTTP method to use for the request. Currently, only `POST` is supported. * `url`: The URL to send the request to. Currently, only `/v1/chat/completions` and `/v1/completions` are supported. * `body`: The request body, which contains the input for the inference request. The parameters in each line's `body` field are the same as the parameters for the underlying endpoint specified by the `url` field. See this [example](/quickstart#test-request) for more details. Here's an example of an input file with 2 requests using the `/v1/chat/completions` endpoint. ```jsonl theme={"system"} {"custom_id": "request-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "google/gemma-3-27b-instruct/bf-16", "messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the capital of France?"}], "max_tokens": 1000}} {"custom_id": "request-2", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "google/gemma-3-27b-instruct/bf-16", "messages": [{"role": "system", "content": "You are an unhelpful assistant."}, {"role": "user", "content": "What is the capital of Belgium?"}], "max_tokens": 1000}} ``` And here is an example of an input file using the `/v1/completions` endpoint: ```jsonl theme={"system"} {"custom_id": "request-1", "method": "POST", "url": "/v1/completions", "body": {"model": "google/gemma-3-27b-instruct/bf-16", "prompt": "What is the capital of France?", "max_tokens": 1000}} {"custom_id": "request-2", "method": "POST", "url": "/v1/completions", "body": {"model": "google/gemma-3-27b-instruct/bf-16", "prompt": "What is the capital of Belgium?", "max_tokens": 1000}} ``` ### 2. Uploading Your Batch Input File In order to create a Batch Processing job, you must first upload your input file. ```typescript TypeScript theme={"system"} import fs from "fs"; const batchInputFile = await client.files.create({ file: fs.createReadStream("batchinput.jsonl"), purpose: "batch", }); console.log(batchInputFile); ``` ```python Python theme={"system"} batch_input_file = client.files.create( file=open("batchinput.jsonl", "rb"), purpose="batch", ) print(batch_input_file) ``` ```bash cURL theme={"system"} curl https://batch.inference.net/v1/files \ -H "Authorization: Bearer $INFERENCE_API_KEY" \ -F purpose="batch" \ -F file="@batchinput.jsonl" ``` The response will look similar to this, depending on the language you are using: ```json JSON theme={"system"} { "id": "file-abc123" } ``` ### 3. Starting the Batch Processing Job Once you've successfully uploaded your input file, you can use the ID of the file to create a batch. In this case, let's assume the file ID is `file-abc123`. For now, the completion window can only be set to `24h`. To associate custom metadata with the batch, you can provide an optional `metadata` parameter. This metadata is not used by Inference.net to complete requests, but it is included when retrieving the status of a batch so that you can associate custom metadata with the batch. > **Note:** The Batch Processing job will begin processing immediately after creation. Create the Batch ```typescript TypeScript theme={"system"} import type { BatchCreateParams } from "openai/resources/batches"; const batch = await client.batches.create({ input_file_id: batchInputFile.id, endpoint: "/v1/chat/completions", completion_window: "24h", metadata: { description: "nightly eval job", }, // Optional. Must be HTTPS. webhook_url: "https://example.com/my_webhook", } as BatchCreateParams); console.log(batch); ``` ```python Python theme={"system"} batch = client.batches.create( input_file_id=batch_input_file.id, endpoint="/v1/chat/completions", completion_window="24h", metadata={ "description": "nightly eval job", }, # Optional. Must be HTTPS. extra_body={ "webhook_url": "https://example.com/my_webhook", }, ) print(batch) ``` ```bash cURL theme={"system"} curl https://batch.inference.net/v1/batches \ -H "Authorization: Bearer $INFERENCE_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "input_file_id": "file-abc123", "endpoint": "/v1/chat/completions", "completion_window": "24h", "metadata": { "description": "nightly eval job" }, "webhook_url": "https://example.com/webhook" }' ``` This request will return a batch object with metadata about your batch: ```json JSON theme={"system"} { "id": "batch_abc123", "object": "batch", "endpoint": "/v1/chat/completions", "errors": null, "input_file_id": "file-abc123", "completion_window": "24h", "status": "validating", "output_file_id": null, "error_file_id": null, "created_at": 1714508499, "in_progress_at": 1714508500, "expires_at": 1714536634, "completed_at": null, "failed_at": null, "expired_at": null, "request_counts": { "total": 2, "completed": 0, "failed": 0 }, "metadata": { "description": "nightly eval job" } } ``` Inference.net supports a `webhook_url` that you can set to receive a webhook notification when the batch is complete. The `webhook_url` must be an HTTPS URL that can receive POST requests. If no metadata is provided when the batch is created, the `metadata` field will be null. Your webhook will receive a POST with a request JSON body that looks like this: ```json JSON theme={"system"} { "batch_id": "batch_abc123", "status": "completed", "metadata": { "description": "nightly eval job" } } ``` The `webhook_url` parameter is not part of the official OpenAI SDK types. In TypeScript, cast the params as `BatchCreateParams` to avoid type errors. In Python, pass it via `extra_body`. ### 4. Checking the Status of a Batch You can check the status of a batch at any time, which will also return a Batch object. Check the status of a batch by retrieving it using the Batch ID assigned to it by Inference.net (represented here by `batch_abc123`). ```typescript TypeScript theme={"system"} const retrievedBatch = await client.batches.retrieve(batch.id); console.log(retrievedBatch); ``` ```python Python theme={"system"} retrieved_batch = client.batches.retrieve(batch.id) print(retrieved_batch) ``` ```bash cURL theme={"system"} curl https://batch.inference.net/v1/batches/batch_abc123 \ -H "Authorization: Bearer $INFERENCE_API_KEY" \ -H "Content-Type: application/json" ``` The status of a given Batch object can be any of the following: | Status | Description | | ------------ | ------------------------------------------------------------------------------ | | validating | the input file is being validated before the batch can begin | | failed | the input file has failed the validation process | | in\_progress | the input file was successfully validated and the batch is currently being run | | finalizing | the batch has completed and the results are being prepared | | completed | the batch has been completed and the results are ready | | expired | the batch was not able to be completed within the 24-hour time window | | cancelling | the batch is being cancelled (may take up to 10 minutes) | | cancelled | the batch was cancelled | ### 5. Retrieving the Results You will receive an email notification when the batch is complete. Once the batch is complete, you can download the output by making a request against the Files API using the `output_file_id` field from the Batch object. Similarly, you can retrieve the error file (containing all failed requests) by making a request against the Files API using the `error_file_id` field from the Batch object. ```typescript TypeScript theme={"system"} const fileResponse = await client.files.content(batch.output_file_id); const fileContents = await fileResponse.text(); console.log(fileContents); ``` ```python Python theme={"system"} file_response = client.files.content(batch.output_file_id) print(file_response.text) ``` ```bash cURL theme={"system"} curl https://batch.inference.net/v1/files/output-file-id/content \ -H "Authorization: Bearer $INFERENCE_API_KEY" > batch_output.jsonl ``` The output `.jsonl` file will have one response line for every successful request line in the input file. Any failed requests in the batch will have their error information written to an error file that can be found via the batch's `error_file_id`. > Note that the output line order **may not match** the input line order. Instead of relying on order to process your results, use the custom\_id field which will be present in each line of your output file and allow you to map requests in your input to results in your output. ```jsonl theme={"system"} {"id": "batch_req_123", "custom_id": "request-2", "response": {"status_code": 200, "request_id": "req_123", "body": {"id": "chatcmpl-123", "object": "chat.completion", "created": 1711652795, "model": "google/gemma-3-27b-instruct/bf-16", "choices": [{"index": 0, "message": {"role": "assistant", "content": "Hello."}, "logprobs": null, "finish_reason": "stop"}], "usage": {"prompt_tokens": 22, "completion_tokens": 2, "total_tokens": 24}, "system_fingerprint": "fp_123"}}, "error": null} {"id": "batch_req_456", "custom_id": "request-1", "response": {"status_code": 200, "request_id": "req_789", "body": {"id": "chatcmpl-abc", "object": "chat.completion", "created": 1711652789, "model": "google/gemma-3-27b-instruct/bf-16", "choices": [{"index": 0, "message": {"role": "assistant", "content": "Hello! How can I assist you today?"}, "logprobs": null, "finish_reason": "stop"}], "usage": {"prompt_tokens": 20, "completion_tokens": 9, "total_tokens": 29}, "system_fingerprint": "fp_3ba"}}, "error": null} ``` ## Listing All Batches At any time, you can see all your batches. For users with many batches, you can use the `limit` and `after` parameters to paginate your results. If an `after` parameter is provided, the list will return batches after the specified batch ID. ```typescript TypeScript theme={"system"} const list = await client.batches.list({ limit: 10, after: batch.id, }); for await (const b of list) { console.log(b); } ``` ```python Python theme={"system"} batches = client.batches.list(limit=10, after=batch.id) for b in batches: print(b) ``` ```bash cURL theme={"system"} curl 'https://batch.inference.net/v1/batches?limit=10&after=batch_abc123' \ -H "Authorization: Bearer $INFERENCE_API_KEY" \ -H "Content-Type: application/json" ``` ## Batch Expiration Batches that do not complete in time eventually move to an `expired` state; unfinished requests within that batch are cancelled, and any responses to completed requests are made available via the batch's output file. You will only be charged for tokens consumed from any completed requests. Expired requests will be written to your error file with the message as shown below. You can use the `custom_id` to retrieve the request data for expired requests. ```jsonl theme={"system"} {"id": "batch_req_123", "custom_id": "request-3", "response": null, "error": {"code": "batch_expired", "message": "This request could not be executed before the completion window expired."}} {"id": "batch_req_123", "custom_id": "request-7", "response": null, "error": {"code": "batch_expired", "message": "This request could not be executed before the completion window expired."}} ``` ## Rate Limits Batch API rate limits are separate from existing per-model rate limits. A single batch may include up to 50,000 requests, and a batch input file can be up to 200 MB in size. If you need higher rate limits, please contact us at [support@inference.net](mailto:support@inference.net). ## Compatibility Notes ### 1. Batch Cancellation Although the OpenAI SDK supports the ability to cancel an in-progress batch, Inference.net does not currently support batch cancellation. This is under development and will be available soon. ### 2. Model Availability Inference.net's Batch Processing is compatible with all of Inference.net's supported models. See our list of [supported models](https://inference.net/models) for a complete list.