Process jobs asynchronously with Batch API.
https://api.inference.net/v1
.
In this example, we are reading the API key from the environment variable INFERENCE_API_KEY
.
.jsonl
file where each line is a separate JSON object that represents an individual request.
Each JSON object must be on a single line and cannot contain any line breaks.
Each JSON object must include the following fields:
custom_id
: A unique identifier for the request. This is used to reference the request’s results after completion. It must be unique for each request in the file.method
: The HTTP method to use for the request. Currently, only POST
is supported.url
: The URL to send the request to. Currently, only /v1/chat/completions
and /v1/completions
are supported.body
: The request body, which contains the input for the inference request. The parameters in each line’s body
field are the same as the parameters for the underlying endpoint specified by the url
field. See this example for more details./v1/chat/completions
endpoint.
/v1/completions
endpoint:
file-abc123
.
For now, the completion window can only be set to 24h
.
To associate custom metadata with the batch, you can provide an optional metadata
parameter.
This metadata is not used by Inference.net to complete requests, but it is included when retrieving the status of a batch so that you can associate custom metadata with the batch.
Note: The Batch Processing job will begin processing immediately after creation.Create the Batch
webhook_url
that you can set to receive a webhook notification when the batch is complete.
The webhook_url
must be an HTTPS URL that can receive POST requests.
Your webhook will receive a POST with a request JSON body that looks like this:
webhook_url
in the request body will result in a type error because it is not an officially supported parameter.
You can safely ignore this error by casting the body as type BatchCreateParams
, like this:
batch_abc123
).
Status | Description |
---|---|
validating | the input file is being validated before the batch can begin |
failed | the input file has failed the validation process |
in_progress | the input file was successfully validated and the batch is currently being run |
finalizing | the batch has completed and the results are being prepared |
completed | the batch has been completed and the results are ready |
expired | the batch was not able to be completed within the 24-hour time window |
cancelling | the batch is being cancelled (may take up to 10 minutes) |
cancelled | the batch was cancelled |
output_file_id
field from the Batch object.
Similarly, you can retrieve the error file (containing all failed requests) by making a request against the Files API using the error_file_id
field from the Batch object.
Supposing the output file ID is output-file-id
in the following example:
.jsonl
file will have one response line for every successful request line in the input file. Any failed requests in the batch will have their error information written to an error file that can be found via the batch’s error_file_id
.
Note that the output line order may not match the input line order.Instead of relying on order to process your results, use the custom_id field which will be present in each line of your output file and allow you to map requests in your input to results in your output.
limit
and after
parameters to paginate your results.
If an after
parameter is provided, the list will return batches after the specified batch ID.
expired
state; unfinished requests within that batch are cancelled, and any responses to completed requests are made available via the batch’s output file. You will only be charged for tokens consumed from any completed requests.
Expired requests will be written to your error file with the message as shown below. You can use the custom_id
to retrieve the request data for expired requests.