Datasets can be created from captured traffic or uploaded as JSONL files. This page covers the supported formats.
The system auto-detects the format from the first valid line. All rows in a file must use the same format.
Each line has a top-level request and optional response object containing raw provider bodies.
| Field | Required | Description |
|---|
request | Yes | Raw provider request body (must include model) |
response | No | Raw provider response body, or null if you only have requests |
{"request":{"model":"gpt-4","messages":[{"role":"user","content":"Hello"}],"temperature":0.7,"max_tokens":100},"response":{"id":"chatcmpl-123","object":"chat.completion","created":1700000000,"model":"gpt-4","choices":[{"index":0,"message":{"role":"assistant","content":"Hi there!"},"finish_reason":"stop"}]}}
Each line has a top-level messages array with role/content objects.
| Field | Required | Description |
|---|
messages | Yes | Array of { role, content } objects (at least one required) |
id | No | Optional row identifier (stored in metadata) |
Valid roles: system, user, assistant.
{"messages":[{"role":"system","content":"You are a helpful assistant."},{"role":"user","content":"What is 2+2?"},{"role":"assistant","content":"4"}]}
When importing Hugging Face-format rows, the system:
- Synthesizes request/response payloads so evals and detail views work
- Sets
request_model to unknown-imported-model
- Sets token usage and costs to zero
- Stores the original row
id in metadata as importOriginalRowId
Upload limits
| Limit | Value |
|---|
| Maximum file size | 10 GB |
| Maximum line count | 1,000,000 |
Datasets can be downloaded in two formats:
| Format | Description | Best for |
|---|
| Hugging Face | { id, messages } per row (default) | Training, fine-tuning, sharing |
| Source-backed | { request, response } per row | Re-uploading, round-trip testing |
In the UI, click Download and choose the format. In the CLI:
# Default (Hugging Face)
inf dataset download my-dataset
# Source-backed format
inf dataset download my-dataset --format source-backed
Hugging Face exports skip rows with empty message arrays. Source-backed exports include all rows with a valid request payload. Row counts may differ between formats.