Make cost-effective inference requests with flexible completion times.
/v1/slow
instead of /v1/
in your API calls to access this feature.
Background inference is cheaper, and easier to build with when you’re application isn’t serving real-time inference.
/chat/completions
calls. Support for /completions
will come later./v1/slow
prefix instead of /v1/
, you can:
/v1/
to /v1/slow/
. The API maintains full compatibility with the OpenAI SDK.
Status | Description |
---|---|
Queued | Request received and queued for processing |
In Progress | Request is currently being processed |
Success | Request completed successfully, results available |
Failed | Request failed due to an error |
/v1/slow/chat/completions
/v1/slow/completions
/v1/
with /v1/slow/
in your existing code to use asynchronous processing.