Evals

What evals give you
Where the inputs come from
Recommended workflow
When to run evals
Next steps

Evals turn datasets and rubrics into repeatable model comparisons. They are the quality loop that keeps platform workflows honest: before training, during training, and before deployment.

What evals give you

reusable eval definitions tied to a rubric
judge-model scoring for real outputs
repeatable comparisons across candidate models or checkpoints
a clean feedback loop into dataset revision and training

Where the inputs come from

Most teams start from data created via traffic capture:

live traffic captured through the proxy
historical uploads imported as JSONL
saved eval datasets built from filtered requests

That is why capturing traffic usually comes first in the lifecycle.

Recommended workflow

Build or import a representative eval dataset.
Define the rubric and scoring range.
Run the eval against your baseline model.
Compare the baseline to a candidate model or trained checkpoint.
Use low-scoring examples to improve the next dataset or training run.

When to run evals

Before training to establish a baseline
During training to compare new checkpoints
Before deployment to confirm the model is ready for rollout

Next steps

Datasets

Start with representative traffic or historical uploads.

Fine-tuning

Use eval failures to drive fine-tuning or distillation.

E2E Fine-tuning Guide

Go from eval failures to a completed training run.

Talk to an engineer

Meet with us if you want help designing the rubric or evaluation strategy.

Capture Traffic Datasets

⌘I

Get Started

Platform

Guides

API

Workhorse Models

What evals give you

Where the inputs come from

Recommended workflow

When to run evals

Next steps

Datasets

Fine-tuning

E2E Fine-tuning Guide

Talk to an engineer

Get Started

Platform

Guides

API

Workhorse Models

​What evals give you

​Where the inputs come from

​Recommended workflow

​When to run evals

​Next steps

Datasets

Fine-tuning

E2E Fine-tuning Guide

Talk to an engineer

What evals give you

Where the inputs come from

Recommended workflow

When to run evals

Next steps