Skip to main content
Deploy gives you a dedicated GPU serving your fine-tuned model. The API is OpenAI-compatible, so switching from an off-the-shelf model to your custom model is a one-line code change. This is the last step in the loop and the beginning of the next one.
Deployments dashboard

Key concepts

ConceptDescription
Dedicated GPUYour model runs on its own GPU. No shared infrastructure, no noisy neighbors. Compute is determined by the recipe used during training.
OpenAI-compatible APISame base URL, same API key, just swap the model parameter. Structured outputs, function calling, and all standard API features work the same way.
Team-scopedDeployments belong to a team, not a project. The model path is your team slug followed by the deployment name you choose (e.g. acme-corp/my-model).
The improvement loopDeploy → observe production performance → run evals to catch regressions → train the next version. The loop continues.

What you can deploy

  • Models trained on the Inference platform
  • Served via an OpenAI-compatible API (chat completions endpoint)
  • Same base URL and API key as the rest of the Inference API

Next steps

Deploy a trained model

Name it, click deploy, start serving.

Call your deployment

One line of code to switch over.

Manage and monitor

Lifecycle operations, scaling, and the improvement loop.