Skip to main content
Datasets can be created from captured traffic or uploaded as JSONL files. This page covers the supported formats.

Supported formats

The system auto-detects the format from the first valid line. All rows in a file must use the same format.

Source-backed format

Each line has a top-level request and optional response object containing raw provider bodies.
FieldRequiredDescription
requestYesRaw provider request body (must include model)
responseNoRaw provider response body, or null if you only have requests
{"request":{"model":"gpt-4","messages":[{"role":"user","content":"Hello"}],"temperature":0.7,"max_tokens":100},"response":{"id":"chatcmpl-123","object":"chat.completion","created":1700000000,"model":"gpt-4","choices":[{"index":0,"message":{"role":"assistant","content":"Hi there!"},"finish_reason":"stop"}]}}

Hugging Face format

Each line has a top-level messages array with role/content objects.
FieldRequiredDescription
messagesYesArray of { role, content } objects (at least one required)
idNoOptional row identifier (stored in metadata)
Valid roles: system, user, assistant.
{"messages":[{"role":"system","content":"You are a helpful assistant."},{"role":"user","content":"What is 2+2?"},{"role":"assistant","content":"4"}]}
When importing Hugging Face-format rows, the system:
  • Synthesizes request/response payloads so evals and detail views work
  • Sets request_model to unknown-imported-model
  • Sets token usage and costs to zero
  • Stores the original row id in metadata as importOriginalRowId

Upload limits

LimitValue
Maximum file size10 GB
Maximum line count1,000,000

Download formats

Datasets can be downloaded in two formats:
FormatDescriptionBest for
Hugging Face{ id, messages } per row (default)Training, fine-tuning, sharing
Source-backed{ request, response } per rowRe-uploading, round-trip testing
In the UI, click Download and choose the format. In the CLI:
# Default (Hugging Face)
inf dataset download my-dataset

# Source-backed format
inf dataset download my-dataset --format source-backed
Hugging Face exports skip rows with empty message arrays. Source-backed exports include all rows with a valid request payload. Row counts may differ between formats.