Create a Deployment

In the dashboard, this workflow lives under Deployments.

What you choose when creating a deployment

The self-serve deployment flow starts with a few core decisions:

name - the human-readable deployment name
model - the model or model reference you want to serve
speed target - a higher-throughput versus lighter-capacity tradeoff
instance count - the initial serving footprint

Depending on your account, you may also see more advanced deployment settings.

Exact fields the dashboard submits

The current dashboard create flow in inference/apps/web/src/pages/dashboard/deployments/NewDeployment.page.tsx submits fields like:

Field	Meaning
`name`	Human-readable deployment name
`teamId`	Owning team
`modelId`	Selected model
`desiredInstances`	Initial instance count
`desiredTokensPerSecond`	Throughput target derived from the speed slider
`publicModelIdentifier`	Public deployment ID, if overridden or auto-generated
`lockedConfigId` / `lockedConfigVersion`	Optional locked engine config
`requirements.cards`	Optional GPU requirements
`flagOverrides`	Optional runtime flag overrides
`environmentVariableOverrides`	Optional environment variable overrides
`configFileOverrides`	Optional config file overrides
`isServerlessDeployment`	Optional public serverless toggle
`serverlessCostPerMillionIn` / `serverlessCostPerMillionOut`	Optional public serverless pricing

The current speed slider maps into an approximate desiredTokensPerSecond range of 50 to 200.

Source-backed sample values

This sample mirrors the exact fields and value shapes used by the dashboard create flow.

{
  "name": "my-production-deployment",
  "teamId": "team_123",
  "modelId": "google/gemma-3-27b-it",
  "desiredInstances": 1,
  "desiredTokensPerSecond": 125,
  "publicModelIdentifier": "your-team/my-production-deployment-a1b2c3"
}

The public identifier format comes from DeploymentMetadata.tsx and defaults to teamSlug/name-randomId. Advanced metadata, locked config, GPU requirements, and override sections currently come from the superadmin-only advanced configuration UI.

Suggested workflow

create a clear deployment name
choose the model you want to serve
set the initial speed and instance count
create the deployment
open the deployment overview and verify the public model identifier and status

After creation

Once the deployment exists, you can use the detail tabs to inspect:

Overview for deployment info and API usage
Instances for runtime capacity and instance status
Inferences for recent traffic served by the deployment
Settings for administrative actions like deletion

A practical starting point

Start smaller, validate the workload, and then scale instance count once you have real traffic and latency data.

Need help?

If you want help planning deployment topology, scaling, or rollout strategy, meet with our team.

Start Here

Workhorse Models

Guides

Reference

Tutorials

What you choose when creating a deployment

Exact fields the dashboard submits

Source-backed sample values

Suggested workflow

After creation

A practical starting point

Need help?

Start Here

Workhorse Models

Guides

Reference

Tutorials

​What you choose when creating a deployment

​Exact fields the dashboard submits

​Source-backed sample values

​Suggested workflow

​After creation

​A practical starting point

​Need help?

What you choose when creating a deployment

Exact fields the dashboard submits

Source-backed sample values

Suggested workflow

After creation

A practical starting point

Need help?