Dataset Formats and Schemas

Datasets can be created from captured traffic or uploaded as JSONL files. This page covers the supported formats.

Supported formats

The system auto-detects the format from the first valid line. All rows in a file must use the same format.

You cannot mix source-backed and Hugging Face rows in the same file. Mixed-format files fail validation.

Source-backed format

Each line has a top-level request and optional response object containing raw provider bodies.

Field	Required	Description
`request`	Yes	Raw provider request body that includes a usable `model` value
`response`	No	Raw provider response body, or `null` if you only have requests

Validation notes:

The request must include a usable model value.
response may be omitted or set to null if you only have request-side data.

{"request":{"model":"gpt-4","messages":[{"role":"user","content":"Hello"}],"temperature":0.7,"max_tokens":100},"response":{"id":"chatcmpl-123","object":"chat.completion","created":1700000000,"model":"gpt-4","choices":[{"index":0,"message":{"role":"assistant","content":"Hi there!"},"finish_reason":"stop"}]}}

Hugging Face format

Each line has a top-level messages array with role/content objects.

Field	Required	Description
`messages`	Yes	Array of `{ role, content }` objects (at least one required)
`id`	No	Optional row identifier (stored in metadata)
`tools`	No	Optional top-level tool definitions

Valid roles: system, user, assistant, tool. Additional supported fields:

content may be a string or an array of content parts for multimodal rows.
Assistant messages may include tool_calls.
Tool messages must include tool_call_id.
Top-level tools are preserved on import.

{"messages":[{"role":"system","content":"You are a helpful assistant."},{"role":"user","content":"What is 2+2?"},{"role":"assistant","content":"4"}]}

When importing Hugging Face-format rows, the system:

Treats the last assistant turn as the imported response and earlier turns as request context
Synthesizes request/response payloads so evals and detail views work
Sets request_model to unknown-imported-model
Sets token usage and costs to zero
Stores the original row id in metadata as importOriginalRowId

Validation behavior

Files must be valid JSONL.
Invalid rows are reported with line numbers in the upload status details.
Uploads can complete with partial failures if at least one row imports successfully.
If every row fails validation, the upload status is failed.

Upload limits

Limit	Value
Maximum file size	10 GB
Maximum line count	1,000,000

Download formats

Datasets can be downloaded in two formats:

Format	Description	Best for
Hugging Face	`{ id, messages }` per row (default)	Training, fine-tuning, sharing
Source-backed	`{ request, response }` per row	Re-uploading, round-trip testing

In the UI, click Download and choose the format. In the CLI:

# Default (Hugging Face)
inf dataset download my-dataset

# Source-backed format

inf dataset download my-dataset --format source-backed

Hugging Face exports skip rows with empty message arrays. Source-backed exports include all rows with a valid request payload. Row counts may differ between formats.

Get Started

Observe

Datasets

Eval

Train

Deploy

Platform

Supported formats

Source-backed format

Hugging Face format

Validation behavior

Upload limits

Download formats

Get Started

Observe

Datasets

Eval

Train

Deploy

Platform

Documentation Index

​Supported formats

​Source-backed format

​Hugging Face format

​Validation behavior

​Upload limits

​Download formats

Supported formats

Source-backed format

Hugging Face format

Validation behavior

Upload limits

Download formats