What you’ll have when you finish
- one training job launched against paired training and eval datasets
- a clear training objective tied to the eval baseline
- a checklist for deciding whether the result is worth promoting
Before you start
- complete /guides/build-a-real-traffic-eval-baseline
- create both a training dataset and an eval dataset with /guides/create-datasets-from-observed-traffic
Step 1: decide whether this is a training problem
Training is worth it when:- the baseline model consistently fails on the same task pattern
- the eval dataset is representative and stable
- the improvement target matters enough to justify a new model artifact
Step 2: choose the model improvement path
Use fine-tuning when the base model is close but not good enough. Use distillation when the teacher model already performs well and your bigger problem is cost, latency, or serving footprint.Step 3: launch the training job with paired datasets
The current self-serve flow expects:- one training dataset
- one eval dataset
- one eval definition and version
- one base model
Step 4: monitor the run, not just the final status
Watch the training job detail page for:- queued vs running vs completed vs failed
- current step vs total steps
- current loss
- checkpoint evals and score distribution
- final model reference and weights
Step 5: decide if the result is good enough to promote
A completed training job is not automatically a production-ready model. Before promotion, confirm:- the training job finished successfully
- the model beats or matches the baseline on the eval you trust
- the latency and cost tradeoffs still make sense
Verify it worked
You should now have:- one completed training job or one active job with a clear monitoring path
- one model reference to evaluate for promotion
What to do next
Promote a Trained Model to Deployment
Move the trained result into a dedicated serving path and validate it.
Meet with Us
Talk to our team if you want help with dataset strategy, distillation, or rollout planning.