Skip to main content
Use this guide when prompt changes are no longer enough and you need to improve the model itself.

What you’ll have when you finish

  • one training job launched against paired training and eval datasets
  • a clear training objective tied to the eval baseline
  • a checklist for deciding whether the result is worth promoting

Before you start

Step 1: decide whether this is a training problem

Training is worth it when:
  • the baseline model consistently fails on the same task pattern
  • the eval dataset is representative and stable
  • the improvement target matters enough to justify a new model artifact
If the issue is still mostly prompt quality or dataset quality, fix those first.

Step 2: choose the model improvement path

Use fine-tuning when the base model is close but not good enough. Use distillation when the teacher model already performs well and your bigger problem is cost, latency, or serving footprint.

Step 3: launch the training job with paired datasets

The current self-serve flow expects:
  • one training dataset
  • one eval dataset
  • one eval definition and version
  • one base model
Training jobs should be anchored to the same eval rubric you used to decide that the baseline was not good enough.

Step 4: monitor the run, not just the final status

Watch the training job detail page for:
  • queued vs running vs completed vs failed
  • current step vs total steps
  • current loss
  • checkpoint evals and score distribution
  • final model reference and weights
Checkpoint evals matter because they tell you whether the model is improving before the job finishes.

Step 5: decide if the result is good enough to promote

A completed training job is not automatically a production-ready model. Before promotion, confirm:
  • the training job finished successfully
  • the model beats or matches the baseline on the eval you trust
  • the latency and cost tradeoffs still make sense

Verify it worked

You should now have:
  • one completed training job or one active job with a clear monitoring path
  • one model reference to evaluate for promotion

What to do next

Promote a Trained Model to Deployment

Move the trained result into a dedicated serving path and validate it.

Meet with Us

Talk to our team if you want help with dataset strategy, distillation, or rollout planning.