Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.inference.net/llms.txt

Use this file to discover all available pages before exploring further.

When training completes, the model is automatically registered and ready to deploy. No manual promotion step. Deployments are scoped to your team, not a specific project. When you create a deployment, you give it a name — the model path becomes your team slug followed by that name (e.g. acme-corp/my-model).

Starting a deployment

There are a few ways to get started:
  • From the Deployments page — click Create
  • From a completed training run — click deploy directly from the training detail page
  • From the Models page — click deploy in the row for your model
All three paths lead to the same flow.

The flow

1

Name the deployment

Give it a descriptive name. This becomes the second part of your model path (e.g. acme-corp/my-model).
2

Deploy

Select your instance configuration and click deploy. If you need more compute than a single GPU, you can reach out to the team directly from this page.
3

Wait for warm-up

The deployment takes a few minutes to 20–30 minutes to come online. This time is spent allocating compute and spinning up the GPU.

After deployment

Once the endpoint is live, you can call it using the same OpenAI-compatible API you’re already using. Just swap in your model path.