Types of datasets
| Type | Purpose | How it evolves |
|---|---|---|
| Eval dataset | Measures model quality against a rubric | Stays stable, a fixed set of challenging examples that act as your benchmark |
| Training dataset | Data the model learns from during fine-tuning | Changes often as you iterate on data quality and coverage |
The zero-overlap rule
Key concepts
| Concept | Description |
|---|---|
| Build from traffic | Filter your captured production inferences and save them as a dataset. The best datasets come from real usage. |
| Upload | Bring your own JSONL files when you have curated data or are migrating from another platform. |
| Dataset format | The schema your data needs to follow. See Dataset Formats for supported fields and validation rules. |
| Task tags | Use task tags when building from traffic to filter by objective. This gives you clean, focused samples instead of mixed traffic. |
Tips for good datasets
- Diverse training data leads to models that generalize well. If your training data isn’t heterogeneous, the trained model won’t handle edge cases.
- Stable eval data gives you a consistent benchmark. Don’t change your eval dataset frequently, it’s the measuring stick.
- Start with production traffic when possible. Real user inputs reflect the actual distribution of requests your model will see, and they’re harder to fake than synthetic data.
- Use task tags to filter by objective before saving a dataset. A dataset scoped to a single task is almost always more useful than one built from mixed traffic.
Next steps
Build from traffic
Turn filtered production traffic into a dataset.
Upload a dataset
Bring your own JSONL files.
Set up your first eval
Use your dataset to compare models.
Train a custom model
Use your dataset to fine-tune a task-specific model.