> ## Documentation Index
> Fetch the complete documentation index at: https://docs.inference.net/llms.txt
> Use this file to discover all available pages before exploring further.

# Upload a Dataset

> Import JSONL inference data, then turn it into eval or training datasets.

If you already have curated data from annotation pipelines, synthetic generation, or another platform, you can upload it as JSONL instead of building a dataset from captured traffic.

Uploads and datasets are separate objects in Catalyst:

* An **upload** is the imported JSONL file plus its validation and processing status.
* A **dataset** is the stable collection you use for evals, training, and download.

You upload the file first, then create an **eval** or **training** dataset from that uploaded data once processing finishes.

## How to upload

<Tabs>
  <Tab title="Dashboard">
    1. Go to **Datasets** in the dashboard
    2. Click **Upload Data**
    3. Select your `.jsonl` file
    4. Give the upload a name and start the import

    The upload appears in **Datasets > Uploads**, where you can track processing and review any validation errors.
  </Tab>

  <Tab title="CLI">
    <Metadata text="platform/datasets/upload-cli" />

    ```bash theme={"system"}
    inf dataset upload path/to/data.jsonl --name support-summaries
    ```

    If you omit `--name`, the CLI uses the filename without the extension. By default the CLI waits for processing and prints the detected format plus the processed line count. Use `--no-wait` to return after the transfer completes.
  </Tab>
</Tabs>

<Note>
  The upload command does not ask whether the data is for evals or training. You choose **eval** vs **training** when you create a dataset from the completed upload.
</Note>

## After upload

1. Wait for the upload to finish processing in **Datasets > Uploads**.
2. Open the dataset creation flow and select the upload as your source.
3. Choose whether the resulting dataset is **eval** or **training**.

Successful uploads become a reusable source in the same dataset creation flow you use for traffic-backed datasets.

## Supported formats

Two JSONL formats are supported. See [Dataset Formats](/platform/datasets/formats) for full schemas, required fields, and validation rules.

| Format            | Structure                        | Best for                                      |
| ----------------- | -------------------------------- | --------------------------------------------- |
| **Source-backed** | `{ request, response }` per line | Round-tripping data captured from providers   |
| **Hugging Face**  | `{ messages }` per line          | Standard training/eval format, easy to create |

The system auto-detects the format from the first valid line. Every row in the file must use the same format.

## Validation behavior

* Invalid rows are reported with line numbers in the upload status details.
* Uploads can complete with some failed rows if at least one row imports successfully.
* Mixed-format files are treated as a fatal error and fail the upload.
* Source-backed rows must include a usable model value in the request.

## Upload limits

| Limit              | Value     |
| ------------------ | --------- |
| Maximum file size  | 10 GB     |
| Maximum line count | 1,000,000 |

## Next steps

<CardGroup cols={3}>
  <Card title="Build from traffic instead" icon="satellite-dish" href="/platform/datasets/build-from-traffic">
    Pull datasets directly from your captured production traffic.
  </Card>

  <Card title="CLI Command Reference" icon="terminal" href="/cli/datasets">
    Upload from the terminal with `inf dataset upload`.
  </Card>

  <Card title="Dataset formats reference" icon="file-code" href="/platform/datasets/formats">
    Full schema details and validation rules.
  </Card>
</CardGroup>

```
```
