Skip to main content
When off-the-shelf models aren’t good enough for a specific task, fine-tune one that is. A task-specific model is typically smaller, faster, and cheaper to run than the general-purpose model it replaces, while being more accurate for your workload. This guide walks through the full loop: preparing data, launching training, and deploying the result.

Prerequisites

Before you start training, you need three things:
  1. A training dataset - the data the model learns from. Build it from captured traffic or upload a JSONL file.
  2. An eval dataset - the data used to measure learning progress. Must have zero overlap with the training dataset (training on eval data causes overfitting).
  3. A validated rubric - run it against your eval dataset first to confirm it captures the quality criteria you care about. See Set Up Your First Eval.

Step by step

1

Create a new training job

In the dashboard, navigate to the Training tab and click New Training Job. You’ll select three things:
  1. Training dataset — the data the model learns from
  2. Eval dataset — a held-out set used to measure learning progress (must have zero overlap with training data)
  3. Rubric — defines the quality criteria the LLM judge scores against during training
Once your datasets and rubric are selected, you’ll choose a recipe — a pre-configured training setup with a base model, optimized parameters, and compute config. Pick based on task difficulty and capability needs, not model names. Most tasks work with small or medium recipes.See Launch a Training Run for the full flow.
2

Monitor training progress

During training, the model periodically runs your eval dataset and gets scored by an LLM judge using your rubric. If scores improve, training continues. If they degrade, training stops early to prevent overfitting. See Monitor a Training Run.

📍 TODO:MEDIA

Screenshot of the training details page showing progress and eval scores.

3

Deploy the trained model

When training completes, the model is automatically registered and ready to deploy. Navigate to Deployments, name it, and click deploy. The GPU spins up in a few minutes to 30 minutes depending on the model size. See Deploy a Trained Model.
4

Call your model

Same base URL, same headers — just swap the model parameter to your trained model’s identifier.
import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.inference.net/v1",
  apiKey: process.env.INFERENCE_API_KEY,
});

const response = await client.chat.completions.create({
  model: "your-org/your-trained-model",
  messages: [{ role: "user", content: "Hello" }],
});
See Call Your Deployment for more details.

After deployment

Use Observe to monitor your deployed model’s production performance. Run evals periodically to catch regressions. When you’re ready to improve further, build a new training dataset from the latest traffic and train the next version. The loop continues.

Next steps

Choose a recipe

Understand the recipe tiers and how to pick the right one.

Launch a training run

End-to-end flow including cost and duration estimates.

Call your deployment

Full setup for calling your deployed model in production.

Monitor with Observe

Track your deployed model’s cost, latency, and quality over time.