Skip to main content
Datasets can be created from captured traffic or uploaded as JSONL files. This page covers the supported formats.

Supported formats

The system auto-detects the format from the first valid line. All rows in a file must use the same format.
You cannot mix source-backed and Hugging Face rows in the same file. Mixed-format files fail validation.

Source-backed format

Each line has a top-level request and optional response object containing raw provider bodies.
FieldRequiredDescription
requestYesRaw provider request body that includes a usable model value
responseNoRaw provider response body, or null if you only have requests
Validation notes:
  • The request must include a usable model value.
  • response may be omitted or set to null if you only have request-side data.
{"request":{"model":"gpt-4","messages":[{"role":"user","content":"Hello"}],"temperature":0.7,"max_tokens":100},"response":{"id":"chatcmpl-123","object":"chat.completion","created":1700000000,"model":"gpt-4","choices":[{"index":0,"message":{"role":"assistant","content":"Hi there!"},"finish_reason":"stop"}]}}

Hugging Face format

Each line has a top-level messages array with role/content objects.
FieldRequiredDescription
messagesYesArray of { role, content } objects (at least one required)
idNoOptional row identifier (stored in metadata)
toolsNoOptional top-level tool definitions
Valid roles: system, user, assistant, tool. Additional supported fields:
  • content may be a string or an array of content parts for multimodal rows.
  • Assistant messages may include tool_calls.
  • Tool messages must include tool_call_id.
  • Top-level tools are preserved on import.
{"messages":[{"role":"system","content":"You are a helpful assistant."},{"role":"user","content":"What is 2+2?"},{"role":"assistant","content":"4"}]}
When importing Hugging Face-format rows, the system:
  • Treats the last assistant turn as the imported response and earlier turns as request context
  • Synthesizes request/response payloads so evals and detail views work
  • Sets request_model to unknown-imported-model
  • Sets token usage and costs to zero
  • Stores the original row id in metadata as importOriginalRowId

Validation behavior

  • Files must be valid JSONL.
  • Invalid rows are reported with line numbers in the upload status details.
  • Uploads can complete with partial failures if at least one row imports successfully.
  • If every row fails validation, the upload status is failed.

Upload limits

LimitValue
Maximum file size10 GB
Maximum line count1,000,000

Download formats

Datasets can be downloaded in two formats:
FormatDescriptionBest for
Hugging Face{ id, messages } per row (default)Training, fine-tuning, sharing
Source-backed{ request, response } per rowRe-uploading, round-trip testing
In the UI, click Download and choose the format. In the CLI:
# Default (Hugging Face)
inf dataset download my-dataset

# Source-backed format

inf dataset download my-dataset --format source-backed

Hugging Face exports skip rows with empty message arrays. Source-backed exports include all rows with a valid request payload. Row counts may differ between formats.