Monitor a Training Run

Training graphs
Evaluations
Checkpoints
Logs

Once a training job starts, the training details page gives you real-time visibility into what’s happening. A progress bar shows percentage complete and the current status of the run.

Training graphs

Four graphs update as training progresses:

Graph	What it measures	What to look for
Loss	How far the model’s predictions are from expected output	Decreasing = learning. Flattening = model has learned what it can from the data.
Learning rate	How much weights update at each training step	Warm-up then decay schedule — configured by the recipe automatically.
Gradient norm	Gradient magnitude during backpropagation	Steady or decreasing = stable. Persistent spikes may indicate a data quality issue.
Eval score	Average score on the eval dataset at each checkpoint	Trending up = model is improving at your task. This is the most direct signal that training is working.

Training details page showing the four training graphs

Evaluations

The platform runs evaluations at three points during a training job:

Before training — establishes a baseline score for the model before any weight updates
During training — at each checkpoint, the model runs your eval dataset and an LLM judge scores the outputs using your rubric
After training — a final evaluation on the completed model

Checkpoints

Training saves checkpoints at regular intervals. If a run fails after a checkpoint, it can be resumed from the last saved state rather than starting over.

Logs

The Logs tab shows output from all GPUs during training. Use it to debug issues or see what’s happening under the hood. You can filter logs by type — warn, error, and others — to focus on what matters.

Launch a Training Run

After Training Completes

⌘I

Get Started

Observe

Datasets

Eval

Train

Deploy

Platform

Training graphs

Evaluations

Checkpoints

Logs

Get Started

Observe

Datasets

Eval

Train

Deploy

Platform

Documentation Index

​Training graphs

​Evaluations

​Checkpoints

​Logs

Training graphs

Evaluations

Checkpoints

Logs