> ## Documentation Index
> Fetch the complete documentation index at: https://docs.inference.net/llms.txt
> Use this file to discover all available pages before exploring further.

# Deploy

> Dedicated GPU infrastructure for serving trained models via an OpenAI-compatible API.

Deploy gives you a dedicated GPU serving your fine-tuned model. The API is OpenAI-compatible, so switching from an off-the-shelf model to your custom model is a one-line code change. This is the last step in the loop and the beginning of the next one.

<Frame>
  <iframe style={{ width: "100%", aspectRatio: "16 / 9", border: 0, display: "block" }} src="https://www.youtube.com/embed/7AAf7Y7Qe24?list=PLJzp7SN2tfJsRAU9VGSfSo60CyDJzqhLP&rel=0" title="Deploy a Production-Ready Fine-Tuned LLM | Catalyst" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowFullScreen />
</Frame>

## Key concepts

| Concept                   | Description                                                                                                                                           |
| ------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Dedicated GPU**         | Your model runs on its own GPU. No shared infrastructure, no noisy neighbors. Compute is determined by the recipe used during training.               |
| **OpenAI-compatible API** | Same base URL, same API key, just swap the model parameter. Structured outputs, function calling, and all standard API features work the same way.    |
| **Team-scoped**           | Deployments belong to a team, not a project. The model path is your team slug followed by the deployment name you choose (e.g. `acme-corp/my-model`). |
| **The improvement loop**  | Deploy → observe production performance → run evals to catch regressions → train the next version. The loop continues.                                |

## What you can deploy

* Models trained on the Inference platform
* Served via an OpenAI-compatible API (chat completions endpoint)
* Same base URL and API key as the rest of the Inference API

## Next steps

<CardGroup cols={2}>
  <Card title="Deploy a trained model" icon="server" href="/platform/deploy/deploy-a-model">
    Name it, click deploy, start serving.
  </Card>

  <Card title="Call your deployment" icon="code" href="/platform/deploy/call-your-deployment">
    One line of code to switch over.
  </Card>

  <Card title="Manage and monitor" icon="sliders" href="/platform/deploy/manage-and-monitor">
    Lifecycle operations, scaling, and the improvement loop.
  </Card>
</CardGroup>
