Deploy gives you a dedicated GPU serving your fine-tuned model. The API is OpenAI-compatible, so switching from an off-the-shelf model to your custom model is a one-line code change. This is the last step in the loop and the beginning of the next one.Documentation Index
Fetch the complete documentation index at: https://docs.inference.net/llms.txt
Use this file to discover all available pages before exploring further.
Key concepts
| Concept | Description |
|---|---|
| Dedicated GPU | Your model runs on its own GPU. No shared infrastructure, no noisy neighbors. Compute is determined by the recipe used during training. |
| OpenAI-compatible API | Same base URL, same API key, just swap the model parameter. Structured outputs, function calling, and all standard API features work the same way. |
| Team-scoped | Deployments belong to a team, not a project. The model path is your team slug followed by the deployment name you choose (e.g. acme-corp/my-model). |
| The improvement loop | Deploy → observe production performance → run evals to catch regressions → train the next version. The loop continues. |
What you can deploy
- Models trained on the Inference platform
- Served via an OpenAI-compatible API (chat completions endpoint)
- Same base URL and API key as the rest of the Inference API
Next steps
Deploy a trained model
Name it, click deploy, start serving.
Call your deployment
One line of code to switch over.
Manage and monitor
Lifecycle operations, scaling, and the improvement loop.