Prerequisites
Training requires three things before you start:- A training dataset - the data the model learns from. Build it from captured traffic or upload your own.
- An eval dataset - measures learning progress. Must have zero overlap with training data.
- A validated rubric - run it against your eval dataset first to confirm it measures what you care about. See Set Up Your First Eval.
📍 TODO:MEDIA
Visual showing the training loop: select recipe → launch run → monitor progress → converge or early-stop → deploy.
Key concepts
| Concept | Description |
|---|---|
| Recipe | A pre-configured training setup with a vetted base model, optimized parameters, and compute config. You pick by task difficulty, the platform handles the ML complexity. |
| Training dataset | The data the model learns from. Diversity and quality matter most. Build from captured traffic or upload your own. |
| Eval dataset | A separate dataset that measures learning progress. Must have zero overlap with training data to prevent overfitting. |
| Rubric | The quality criteria that guide training. Mid-training evals use it to decide when to stop. If the rubric is wrong, the model optimizes for the wrong thing. |
| Mid-training evals | Periodic quality checks during training. If scores improve, training continues. If they degrade, training stops early to prevent overfitting. |
Why fine-tune
- Reduce latency - a smaller, task-specific model responds faster than a general-purpose one
- Reduce cost - smaller models cost less to serve at scale
- Improve accuracy - a model trained on your data and scored against your rubric is optimized for exactly what you need
- Maintain ownership - you own the model artifact and control where it runs
What makes good training data
The quality of your trained model depends directly on the quality of your data. A few principles:- Diversity matters most. Training data should cover the range of inputs the model will see in production — different phrasings, edge cases, varying complexity. A narrow dataset produces a narrow model.
- Real traffic beats synthetic data. Production inputs reflect what users actually send. Build datasets from live traffic when possible.
- Scope to a single task. A dataset built from mixed traffic teaches the model many things poorly. Use task tags to filter for one objective at a time.
- More is generally better, but quality trumps volume. A thousand clean, diverse examples outperform ten thousand repetitive ones.
Next steps
Choose a recipe
Pick a pre-configured training setup for your task.
Launch a training run
End-to-end flow from datasets to queued job.
Monitor a training run
Track progress, graphs, and logs during training.
Deploy a trained model
Ship your model to a dedicated GPU.