Skip to main content
This guide is the handoff from model improvement to production validation.

What you’ll have when you finish

  • one deployment for the trained model
  • one public model identifier
  • one successful smoke test against the deployment endpoint

Before you start

Step 1: review the completed training job

Before you create a deployment, inspect the training job detail page for:
  • final status
  • external job ID
  • base model
  • current or final loss
  • checkpoint evals and average scores
  • final model reference / weights
Do not promote a model you cannot explain.

Step 2: create the deployment

In the deployment create flow, you choose:
  • deployment name
  • model
  • speed target
  • instance count
The dashboard also generates a public deployment identifier in the shape teamSlug/name-randomId unless you override it.

Step 3: copy the public model identifier

Once the deployment exists, copy the deployment’s public model identifier from the overview page. You will use that as the model value in a normal API request.

Step 4: send a smoke test

Use the exact deployment-specific API example on /deploy/call-a-deployed-model and verify that:
  • the request completes successfully
  • the output is correct enough for the workflow
  • the request shows up in the deployment inferences view

Step 5: watch the deployed traffic

After rollout, inspect:
  • deployment overview
  • instances
  • recent deployment inferences
  • Observe analytics for the surrounding workflow
The goal is not just “deployment succeeded.” The goal is “the model behaves correctly under real traffic.”

Verify it worked

You should now have:
  • one live deployment
  • one public model identifier
  • one successful deployment request visible in the deployment inferences tab

What to do next

Observe Overview

Keep routing real traffic through Inference.net so the next eval and training cycle stays grounded in production behavior.

Choose Realtime, Background, Group, or Batch

Decide whether the deployment should serve interactive traffic or a non-interactive workflow.