Prerequisites
Before you start training, you need three things:- A training dataset - the data the model learns from. Build it from captured traffic or upload a JSONL file.
- An eval dataset - the data used to measure learning progress. Must have zero overlap with the training dataset (training on eval data causes overfitting).
- A validated rubric - run it against your eval dataset first to confirm it captures the quality criteria you care about. See Set Up Your First Eval.
Step by step
Create a new training job
In the dashboard, navigate to the Training tab and click New Training Job. You’ll select three things:
- Training dataset — the data the model learns from
- Eval dataset — a held-out set used to measure learning progress (must have zero overlap with training data)
- Rubric — defines the quality criteria the LLM judge scores against during training
Monitor training progress
During training, the model periodically runs your eval dataset and gets scored by an LLM judge using your rubric. If scores improve, training continues. If they degrade, training stops early to prevent overfitting. See Monitor a Training Run.
📍 TODO:MEDIA
Screenshot of the training details page showing progress and eval scores.
Deploy the trained model
When training completes, the model is automatically registered and ready to deploy. Navigate to Deployments, name it, and click deploy. The GPU spins up in a few minutes to 30 minutes depending on the model size. See Deploy a Trained Model.
Call your model
Same base URL, same headers — just swap the See Call Your Deployment for more details.
model parameter to your trained model’s identifier.After deployment
Use Observe to monitor your deployed model’s production performance. Run evals periodically to catch regressions. When you’re ready to improve further, build a new training dataset from the latest traffic and train the next version. The loop continues.Next steps
Choose a recipe
Understand the recipe tiers and how to pick the right one.
Launch a training run
End-to-end flow including cost and duration estimates.
Call your deployment
Full setup for calling your deployed model in production.
Monitor with Observe
Track your deployed model’s cost, latency, and quality over time.