Skip to main content
Once a deployment exists, the dashboard shows a public model identifier for that deployment. You use that identifier as the model value in normal Inference.net API requests.

What stays the same

Calling a deployed model uses the same core API shape as the direct serverless API:
  • base URL: https://api.inference.net/v1
  • auth: Authorization: Bearer ...
  • standard request bodies for chat completions and related endpoints

What changes

The main difference is the model name. Instead of a catalog model identifier, you use the deployment’s public model identifier from the deployment overview page.

Source-backed API examples

The dashboard generates these examples from inference/apps/web/src/components/deployments/DeploymentApiExamples.tsx using the same snippet generator as the direct quickstart.
curl -N https://api.inference.net/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $INFERENCE_API_KEY" \
  -d '{
    "messages": [
      {
        "content": "You are a helpful assistant.",
        "role": "system"
      },
      {
        "content": "What is the meaning of life?",
        "role": "user"
      }
    ],
    "model": "your-team/my-production-deployment-a1b2c3",
    "stream": true
  }'
  1. open the deployment overview
  2. copy the public model identifier or API example
  3. send a smoke test against the normal API endpoint using that model value
  4. inspect the request in the deployment’s inferences tab

Best practices

  • keep a simple smoke test for each deployment
  • verify both correctness and latency before routing real traffic
  • continue observing production traffic after rollout