
Key concepts
| Concept | Description |
|---|---|
| Dedicated GPU | Your model runs on its own GPU. No shared infrastructure, no noisy neighbors. Compute is determined by the recipe used during training. |
| OpenAI-compatible API | Same base URL, same API key, just swap the model parameter. Structured outputs, function calling, and all standard API features work the same way. |
| Team-scoped | Deployments belong to a team, not a project. The model path is your team slug followed by the deployment name you choose (e.g. acme-corp/my-model). |
| The improvement loop | Deploy → observe production performance → run evals to catch regressions → train the next version. The loop continues. |
What you can deploy
- Models trained on the Inference platform
- Served via an OpenAI-compatible API (chat completions endpoint)
- Same base URL and API key as the rest of the Inference API
Next steps
Deploy a trained model
Name it, click deploy, start serving.
Call your deployment
One line of code to switch over.
Manage and monitor
Lifecycle operations, scaling, and the improvement loop.