Calling a serverless deployment
Serverless deployments work exactly like any other model on the OpenAI-compatible API. Use the deployment’s model path as themodel:
Billing
Serverless deployments are priced in USD per 1M tokens, with separate input and output rates set per deployment. Usage is billed to the calling team’s credit balance like any other serverless inference:- Requests are authorized against your credit balance up front; if the
balance can’t cover the estimated cost, the API responds with
402. - The actual charge is settled when the inference completes, from the real token counts reported by the serving engine.
- Failed inferences are never billed.
- Charges appear in your usage dashboard under the deployment’s model path.