> ## Documentation Index > Fetch the complete documentation index at: https://docs.inference.net/llms.txt > Use this file to discover all available pages before exploring further. # Build a Dataset from Traffic > Turn production traffic into datasets for evaluation and training. The most useful datasets come from real production traffic. Catalyst lets you filter your captured inferences and save the results as a dataset, ready for evals or training. You can create a dataset from traffic in two places: the **Create Dataset** button in the Datasets tab, or **Save as Dataset** in the [Inference Viewer](/platform/gateway/inference-viewer). Both follow the same flow. ## The flow Filter by model, task, provider, status code, or any tracked dimension until you have a representative slice of traffic. Decide whether this will be an **eval dataset** or a **training dataset**. Remember the [zero-overlap rule](/platform/datasets/overview#the-zero-overlap-rule), training and eval data must never share examples. Name the dataset and save. It's immediately available for evals or training. ## Getting clean samples The quality of your dataset depends on how well you filter. A few tips: * **Filter by [task](/platform/gateway/tasks)** to get samples for a specific objective rather than a mix of everything * **Exclude errors** unless you specifically want failure cases (e.g. for training a model to handle edge cases) * **Check the date range** - a dataset pulled from a single day might not capture the full variety of inputs your app sees ## Eval vs training: different goals, different data **Eval datasets** should be small, stable, and challenging. Pick examples that represent the hard cases — the ones where you're not sure the model will get it right. These become your benchmark, so don't change them often. **Training datasets** should be large, diverse, and representative. The more variety, the better the model generalizes. Iterate on these as you learn what the model struggles with. ## Next steps Already have curated data? Upload it directly. Supported schemas and validation rules.