Deployments.
What you choose when creating a deployment
The self-serve deployment flow starts with a few core decisions:- name - the human-readable deployment name
- model - the model or model reference you want to serve
- speed target - a higher-throughput versus lighter-capacity tradeoff
- instance count - the initial serving footprint
Exact fields the dashboard submits
The current dashboard create flow ininference/apps/web/src/pages/dashboard/deployments/NewDeployment.page.tsx submits fields like:
| Field | Meaning |
|---|---|
name | Human-readable deployment name |
teamId | Owning team |
modelId | Selected model |
desiredInstances | Initial instance count |
desiredTokensPerSecond | Throughput target derived from the speed slider |
publicModelIdentifier | Public deployment ID, if overridden or auto-generated |
lockedConfigId / lockedConfigVersion | Optional locked engine config |
requirements.cards | Optional GPU requirements |
flagOverrides | Optional runtime flag overrides |
environmentVariableOverrides | Optional environment variable overrides |
configFileOverrides | Optional config file overrides |
isServerlessDeployment | Optional public serverless toggle |
serverlessCostPerMillionIn / serverlessCostPerMillionOut | Optional public serverless pricing |
desiredTokensPerSecond range of 50 to 200.
Source-backed sample values
This sample mirrors the exact fields and value shapes used by the dashboard create flow.DeploymentMetadata.tsx and defaults to teamSlug/name-randomId.
Advanced metadata, locked config, GPU requirements, and override sections currently come from the superadmin-only advanced configuration UI.
Suggested workflow
- create a clear deployment name
- choose the model you want to serve
- set the initial speed and instance count
- create the deployment
- open the deployment overview and verify the public model identifier and status
After creation
Once the deployment exists, you can use the detail tabs to inspect:- Overview for deployment info and API usage
- Instances for runtime capacity and instance status
- Inferences for recent traffic served by the deployment
- Settings for administrative actions like deletion