The most useful datasets come from real production traffic. Catalyst lets you filter your captured inferences and save the results as a dataset, ready for evals or training. You can create a dataset from traffic in two places: the Create Dataset button in the Datasets tab, or Save as Dataset in the Inference Viewer. Both follow the same flow.Documentation Index
Fetch the complete documentation index at: https://docs.inference.net/llms.txt
Use this file to discover all available pages before exploring further.
The flow
Filter your traffic
Filter by model, task, provider, status code, or any tracked dimension until you have a representative slice of traffic.
Choose a dataset type
Decide whether this will be an eval dataset or a training dataset. Remember the zero-overlap rule, training and eval data must never share examples.
Getting clean samples
The quality of your dataset depends on how well you filter. A few tips:- Filter by task to get samples for a specific objective rather than a mix of everything
- Exclude errors unless you specifically want failure cases (e.g. for training a model to handle edge cases)
- Check the date range - a dataset pulled from a single day might not capture the full variety of inputs your app sees
Eval vs training: different goals, different data
Eval datasets should be small, stable, and challenging. Pick examples that represent the hard cases — the ones where you’re not sure the model will get it right. These become your benchmark, so don’t change them often. Training datasets should be large, diverse, and representative. The more variety, the better the model generalizes. Iterate on these as you learn what the model struggles with.Next steps
Upload your own data
Already have curated data? Upload it directly.
Dataset formats
Supported schemas and validation rules.