> ## Documentation Index
> Fetch the complete documentation index at: https://docs.inference.net/llms.txt
> Use this file to discover all available pages before exploring further.

# Datasets

> Upload JSONL inference data, materialize eval/training datasets, and download them.

Use `inf dataset` to upload JSONL inference data and manage datasets created from captured traffic, existing uploads, or JSONL files on disk. Materialized datasets feed into [`inf eval run`](/cli/evals) for evals and into training jobs.

**Alias:** `inf datasets`

## `inf dataset upload`

Import a JSONL file into the active project as an upload entry. An upload is the raw material you can then materialize into an eval or training dataset. The CLI validates the file locally, uploads it in parts, waits for processing to finish, and prints the detected format plus the processed line count.

<Metadata text="cli/dataset/upload" />

```bash theme={"system"}
inf dataset upload <file>
```

### Arguments

| Argument | Required | Description                      |
| -------- | -------- | -------------------------------- |
| `file`   | Yes      | Path to the JSONL file to upload |

### Options

| Flag                | Required | Description                                                      | Default                    |
| ------------------- | -------- | ---------------------------------------------------------------- | -------------------------- |
| `-n, --name <name>` | No       | Upload name shown in Catalyst                                    | Filename without extension |
| `--no-wait`         | No       | Return after the transfer finishes instead of polling processing | Off                        |

Uploaded data appears in **Datasets → Uploads** in the dashboard. Once processing completes, create an eval or training dataset from that upload — either with [`inf dataset create --upload-id`](#inf-dataset-create) below or in the dashboard.

### Examples

<Metadata text="cli/dataset/upload-examples" />

```bash theme={"system"}
# Use the filename as the upload name and wait for processing
inf dataset upload ./data/support-summaries.jsonl

# Set a custom upload name
inf dataset upload ./data/support-summaries.jsonl --name support-summaries-v2

# Return after the transfer completes, without waiting for processing
inf dataset upload ./data/support-summaries.jsonl --no-wait
```

## `inf dataset create`

Materialize an eval or training dataset from captured traffic, an existing upload, or a JSONL file on disk. The file-backed path uploads, waits for processing, and materializes in one command.

<Metadata text="cli/dataset/create" />

```bash theme={"system"}
inf dataset create -n <name> -t <type> [source-flags…]
```

### Options

| Flag                   | Required | Description                                                                                                                        | Default           |
| ---------------------- | -------- | ---------------------------------------------------------------------------------------------------------------------------------- | ----------------- |
| `-n, --name <name>`    | Yes      | Dataset name                                                                                                                       | —                 |
| `-t, --type <type>`    | Yes      | `eval` or `training`                                                                                                               | —                 |
| `-f, --file <path>`    | No       | JSONL file on disk — uploads, waits for processing, then materializes from that upload                                             | —                 |
| `--upload-id <id>`     | No       | Materialize from an existing upload                                                                                                | —                 |
| `--task <taskId>`      | No       | Filter captured traffic by task ID                                                                                                 | —                 |
| `--model <modelId>`    | No       | Filter captured traffic by model ID                                                                                                | —                 |
| `--since <date>`       | No       | Start of the time window for traffic filters (ISO 8601 or `YYYY-MM-DD HH:MM:SS`)                                                   | 30 days ago       |
| `--until <date>`       | No       | End of the time window for traffic filters                                                                                         | 1 minute from now |
| `--limit <n>`          | No       | Cap on the number of inferences included                                                                                           | —                 |
| `--status <status>`    | No       | Status filter: `success` (default), `2xx`, or a specific code like `200` — datasets reject non-success traffic unless you override | `success`         |
| `--description <text>` | No       | Free-text dataset description                                                                                                      | —                 |

`--file` and `--upload-id` are mutually exclusive — `--file` creates a new upload automatically. Date values accept ISO 8601 (`2026-04-01T00:00:00Z`) or ClickHouse format (`2026-04-01 00:00:00`).

Dataset materialization runs asynchronously. The command prints the dataset ID and points at `inf dataset get <id>` to check progress.

### Examples

<Metadata text="cli/dataset/create-examples" />

```bash theme={"system"}
# One-command: upload a JSONL file and materialize an eval dataset
inf dataset create -n demo-eval -t eval --file ./samples.jsonl

# Materialize from an existing upload
inf dataset create -n training-v1 -t training --upload-id up_abc123

# Filter captured traffic by task + time window
inf dataset create -n support-eval -t eval \
  --task support-tickets \
  --since 2026-04-01 \
  --until 2026-04-14

# Cap to 1,000 rows (only successful traffic is included by default)
inf dataset create -n small-eval -t eval --task support-tickets --limit 1000
```

## `inf dataset list`

Display datasets in the active project.

<Metadata text="cli/dataset/list" />

```bash theme={"system"}
inf dataset list
```

**Alias:** `inf dataset ls`

### Options

| Flag              | Required | Description               | Default |
| ----------------- | -------- | ------------------------- | ------- |
| `-l, --limit <n>` | No       | Maximum number of results | `20`    |

The table shows the dataset ID (8-char prefix), name, type, inference count, export status, and creation date. Use `--json` to get full UUIDs for scripting.

### Examples

<Metadata text="cli/dataset/list-examples" />

```bash theme={"system"}
# Default table view
inf dataset list

# More results
inf dataset list --limit 100

# Pipe full UUIDs into another command
inf dataset list --json | jq -r '.[].id'
```

## `inf dataset get`

View detailed information about a specific dataset — ID, name, type, inference count, export status, source project, and creation date.

<Metadata text="cli/dataset/get" />

```bash theme={"system"}
inf dataset get <id>
```

### Arguments

| Argument | Required | Description                                       |
| -------- | -------- | ------------------------------------------------- |
| `id`     | Yes      | Dataset ID, UUID prefix (4+ chars), or exact name |

## `inf dataset download`

Download a dataset as a JSONL file. If the server-side export isn't ready yet, the CLI requests it and polls until it's ready before downloading.

<Metadata text="cli/dataset/download" />

```bash theme={"system"}
inf dataset download [id]
```

### Arguments

| Argument | Required | Description                                                                                                              |
| -------- | -------- | ------------------------------------------------------------------------------------------------------------------------ |
| `id`     | No       | Dataset ID, UUID prefix (4+ chars), or exact name. If omitted in an interactive terminal, the CLI prompts you to choose. |

### Options

| Flag                    | Required | Description                                       | Default                                                                                         |
| ----------------------- | -------- | ------------------------------------------------- | ----------------------------------------------------------------------------------------------- |
| `-o, --output <path>`   | No       | Output file path                                  | `<dataset-name>.jsonl` for Hugging Face, `<dataset-name>.source-backed.jsonl` for source-backed |
| `-f, --format <format>` | No       | Download format: `huggingface` or `source-backed` | Prompted in a TTY; otherwise `huggingface`                                                      |

The CLI resolves dataset IDs by exact ID, UUID prefix, or exact name.

### Examples

<Metadata text="cli/dataset/download-examples" />

```bash theme={"system"}
# Download to the default filename
inf dataset download ds_abc123

# Download to a specific file
inf dataset download ds_abc123 --output ./data/my-dataset.jsonl

# Download source-backed JSONL (request/response objects)
inf dataset download customer-support-eval --format source-backed

# Pick interactively when no id is provided
inf dataset download
```
