Deploy

Key concepts
What you can deploy
Next steps

Deploy gives you a dedicated GPU serving your fine-tuned model. The API is OpenAI-compatible, so switching from an off-the-shelf model to your custom model is a one-line code change. This is the last step in the loop and the beginning of the next one.

Key concepts

Concept	Description
Dedicated GPU	Your model runs on its own GPU. No shared infrastructure, no noisy neighbors. Compute is determined by the recipe used during training.
OpenAI-compatible API	Same base URL, same API key, just swap the model parameter. Structured outputs, function calling, and all standard API features work the same way.
Team-scoped	Deployments belong to a team, not a project. The model path is your team slug followed by the deployment name you choose (e.g. `acme-corp/my-model`).
The improvement loop	Deploy → observe production performance → run evals to catch regressions → train the next version. The loop continues.

What you can deploy

Models trained on the Inference platform
Served via an OpenAI-compatible API (chat completions endpoint)
Same base URL and API key as the rest of the Inference API

Next steps

Deploy a trained model

Name it, click deploy, start serving.

Call your deployment

One line of code to switch over.

Manage and monitor

Lifecycle operations, scaling, and the improvement loop.

Troubleshooting Training Failures

Deploy a Trained Model

⌘I

Get Started

Observe

Datasets

Eval

Train

Platform

Deploy

Key concepts

What you can deploy

Next steps

Deploy a trained model

Call your deployment

Manage and monitor

Get Started

Observe

Datasets

Eval

Train

Deploy

Platform

Documentation Index

​Key concepts

​What you can deploy

​Next steps

Deploy a trained model

Call your deployment

Manage and monitor

Key concepts

What you can deploy

Next steps