📍 TODO:MEDIA
Screenshot of the dataset creation flow showing filter selection and dataset naming.
The flow
Filter your traffic
Filter by model, task, provider, status code, or any tracked dimension until you have a representative slice of traffic.
Choose a dataset type
Decide whether this will be an eval dataset or a training dataset. Remember the zero-overlap rule, training and eval data must never share examples.
Getting clean samples
The quality of your dataset depends on how well you filter. A few tips:- Filter by task to get samples for a specific objective rather than a mix of everything
- Exclude errors unless you specifically want failure cases (e.g. for training a model to handle edge cases)
- Check the date range - a dataset pulled from a single day might not capture the full variety of inputs your app sees
Eval vs training: different goals, different data
Eval datasets should be small, stable, and challenging. Pick examples that represent the hard cases — the ones where you’re not sure the model will get it right. These become your benchmark, so don’t change them often. Training datasets should be large, diverse, and representative. The more variety, the better the model generalizes. Iterate on these as you learn what the model struggles with.Next steps
Upload your own data
Already have curated data? Upload it directly.
Dataset formats
Supported schemas and validation rules.