Manage and Monitor - Inference.net Documentation

Deployments list with per-deployment metrics

Lifecycle operations

Start — bring a stopped deployment back online
Stop — take the deployment offline
Delete — remove the deployment entirely

Scaling

By default, your model deploys on a single dedicated GPU. If you need additional GPUs or auto-scaling, reach out to our team.

Monitoring

Once your deployment is live, click into it from the Deployments page to see metrics and individual inference calls. You get the same Observe experience — latency, error rates, token usage, and full request/response payloads — scoped to that deployment.

The loop continues

Your custom model is live. Use Observe to watch its production performance, run evals to catch regressions, and train the next version when you’re ready.

Observe

Monitor production traffic.

Eval

Catch quality regressions.

Train

Build the next version.

Call Your Deployment

Open Source Models

⌘I

Documentation Index

​Lifecycle operations

​Scaling

​Monitoring

​The loop continues