Batch API is currently compatible with all the models we offer.
Overview
While some uses require you to send synchronous requests, there are many cases where requests do not need an immediate response or rate limits prevent you from executing a large number of queries quickly. Batch processing jobs are often helpful in use cases like:- Extracting structured data from a large number of documents.
- Generating synthetic data for training.
- Translating a large number of documents into other languages.
- Summarizing a large number of customer interactions.
- Higher rate limits: Substantially more headroom compared to the synchronous APIs.
- Fast completion times: Each batch completes within 24 hours (and often much more quickly).
Getting Started
You’ll need an Inference.net account and API key to use the Batch API. See our Quick Start Guide for instructions on how to create an account and get an API key. Install the OpenAI SDK for your language of choice. To connect to Inference.net using the OpenAI SDK, you will need to set the base URL tohttps://api.inference.net/v1
.
In this example, we are reading the API key from the environment variable INFERENCE_API_KEY
.
Running A Batch Processing Job
1. Preparing Your Batch File
Prepare a.jsonl
file where each line is a separate JSON object that represents an individual request.
Each JSON object must be on a single line and cannot contain any line breaks.
Each JSON object must include the following fields:
custom_id
: A unique identifier for the request. This is used to reference the request’s results after completion. It must be unique for each request in the file.method
: The HTTP method to use for the request. Currently, onlyPOST
is supported.url
: The URL to send the request to. Currently, only/v1/chat/completions
and/v1/completions
are supported.body
: The request body, which contains the input for the inference request. The parameters in each line’sbody
field are the same as the parameters for the underlying endpoint specified by theurl
field. See this example for more details.
/v1/chat/completions
endpoint.
/v1/completions
endpoint:
2. Uploading Your Batch Input File
In order to create a Batch Processing job, you must first upload your input file.2. Starting the Batch Processing Job
Once you’ve successfully uploaded your input file, you can use the ID of the file to create a batch. In this case, let’s assume the file ID isfile-abc123
.
For now, the completion window can only be set to 24h
.
To associate custom metadata with the batch, you can provide an optional metadata
parameter.
This metadata is not used by Inference.net to complete requests, but it is included when retrieving the status of a batch so that you can associate custom metadata with the batch.
Note: The Batch Processing job will begin processing immediately after creation.Create the Batch
webhook_url
that you can set to receive a webhook notification when the batch is complete.
The webhook_url
must be an HTTPS URL that can receive POST requests.
If no metadata is provided when the batch is created, the metadata
field will be null.
Your webhook will receive a POST with a request JSON body that looks like this:
webhook_url
in the request body will result in a type error because it is not an officially supported parameter.
You can safely ignore this error by casting the body as type BatchCreateParams
, like this:
JavaScript
2. Checking the Status of a Batch
You can check the status of a batch at any time, which will also return a Batch object. Check the status of a batch by retrieving it using the Batch ID assigned to it by Inference.net (represented here bybatch_abc123
).
Status | Description |
---|---|
validating | the input file is being validated before the batch can begin |
failed | the input file has failed the validation process |
in_progress | the input file was successfully validated and the batch is currently being run |
finalizing | the batch has completed and the results are being prepared |
completed | the batch has been completed and the results are ready |
expired | the batch was not able to be completed within the 24-hour time window |
cancelling | the batch is being cancelled (may take up to 10 minutes) |
cancelled | the batch was cancelled |
3. Retrieving the Results
You will receive an email notification when the batch is complete. Once the batch is complete, you can download the output by making a request against the Files API using theoutput_file_id
field from the Batch object.
Similarly, you can retrieve the error file (containing all failed requests) by making a request against the Files API using the error_file_id
field from the Batch object.
Supposing the output file ID is output-file-id
in the following example:
.jsonl
file will have one response line for every successful request line in the input file. Any failed requests in the batch will have their error information written to an error file that can be found via the batch’s error_file_id
.
Note that the output line order may not match the input line order.Instead of relying on order to process your results, use the custom_id field which will be present in each line of your output file and allow you to map requests in your input to results in your output.
Listing All Batches
At any time, you can see all your batches. For users with many batches, you can use thelimit
and after
parameters to paginate your results.
If an after
parameter is provided, the list will return batches after the specified batch ID.
Batch Expiration
Batches that do not complete in time eventually move to anexpired
state; unfinished requests within that batch are cancelled, and any responses to completed requests are made available via the batch’s output file. You will only be charged for tokens consumed from any completed requests.
Expired requests will be written to your error file with the message as shown below. You can use the custom_id
to retrieve the request data for expired requests.