> ## Documentation Index
> Fetch the complete documentation index at: https://docs.inference.net/llms.txt
> Use this file to discover all available pages before exploring further.

# Evals

> Create rubrics, launch eval runs, and inspect results from the terminal.

Run and inspect model evaluations from the command line. Manage rubrics (the judge prompts evals run against), list and inspect run groups, launch new runs, and browse eval-ready datasets.

**Alias:** `inf evals`

The full eval loop is paste-able from the terminal:

<Metadata text="cli/eval/paste-able-loop" />

```bash theme={"system"}
# 1. Create a rubric from a markdown file
inf eval rubric create -n support-tickets-v1 -f ./rubric.md
# → Rubric rub_abc12 / version rv_xyz45 created.

# 2. Materialize an eval dataset (traffic-backed, upload-backed, or from a file)
inf dataset create -n demo-eval -t eval --file ./samples.jsonl
# → Dataset ds_def78 created.

# 3. Launch an eval run group
inf eval run \
  --rubric-id rub_abc12 \
  --dataset-id ds_def78 \
  --models openai:gpt-5.2,anthropic:claude-sonnet-4-6 \
  --judge-model anthropic:claude-sonnet-4-6
# → Run group rg_20260415_152340 created.

# 4. Track progress
inf eval get rg_20260415_152340
```

Route IDs look like `<provider>:<model-alias>` (e.g. `openai:gpt-5.2`). Use [`inf models list`](/cli/models#inf-models-list) to discover every route ID available to your team — see [Route IDs](/cli/models#route-ids) for the full format.

## `inf eval rubric create`

Create a rubric — the judge prompt an eval run scores responses against. Rubrics live in the active project, carry versioned prompt content, and are passed to `inf eval run` by ID. The template must contain the placeholder `{{ eval_model_response }}` where the model's response will be injected for scoring.

<Metadata text="cli/eval/rubric-create" />

```bash theme={"system"}
inf eval rubric create -n <name> -f <path-to-markdown>
```

### Options

| Flag                | Required | Description                                                  | Default        |
| ------------------- | -------- | ------------------------------------------------------------ | -------------- |
| `-n, --name <name>` | Yes      | Rubric name                                                  | —              |
| `-f, --file <path>` | Yes      | Path to a markdown file containing the judge prompt template | —              |
| `--max-score <n>`   | No       | Maximum score for the rubric (2–100)                         | `10`           |
| `--project-id <id>` | No       | Project to create the rubric in                              | Active project |

Prints the rubric ID and the first version ID. Use them directly with `inf eval run`.

### Examples

<Metadata text="cli/eval/rubric-create-examples" />

```bash theme={"system"}
# Create a rubric with the default 0–10 scoring scale
inf eval rubric create -n support-tickets-v1 -f ./rubric.md

# Create a rubric with a 0–100 scale
inf eval rubric create -n quality-v2 -f ./quality.md --max-score 100
```

## `inf eval rubric get`

Get details of a rubric — ID, name, latest version number, version count, score range, and a preview of the template.

<Metadata text="cli/eval/rubric-get" />

```bash theme={"system"}
inf eval rubric get <id>
```

### Arguments

| Argument | Required | Description                                          |
| -------- | -------- | ---------------------------------------------------- |
| `id`     | Yes      | Full UUID, 4+ character prefix, or exact rubric name |

Ambiguous prefixes print the candidate list and abort.

## `inf eval rubric delete`

Archive (soft-delete) a rubric. Rubrics cannot be hard-deleted — archiving hides them from `inf eval rubrics` but preserves their eval history. Restore from the dashboard if needed.

<Metadata text="cli/eval/rubric-delete" />

```bash theme={"system"}
inf eval rubric delete <id>
```

**Alias:** `inf eval rubric archive <id>` — both names do the same thing; use whichever reads clearer in your script.

### Arguments

| Argument | Required | Description                                          |
| -------- | -------- | ---------------------------------------------------- |
| `id`     | Yes      | Full UUID, 4+ character prefix, or exact rubric name |

### Options

| Flag        | Required                    | Description                  | Default |
| ----------- | --------------------------- | ---------------------------- | ------- |
| `-y, --yes` | Yes in non-TTY environments | Skip the confirmation prompt | Off     |

In an interactive terminal, the CLI asks for confirmation unless `-y` is passed. In non-TTY environments (CI, scripts) the command refuses to run without `-y`.

### Examples

<Metadata text="cli/eval/rubric-delete-examples" />

```bash theme={"system"}
# Archive interactively (prompts for confirmation)
inf eval rubric delete support-tickets-v1

# Archive non-interactively
inf eval rubric archive rub_abc12 --yes
```

## `inf eval rubrics`

List rubrics in the active project.

<Metadata text="cli/eval/rubrics" />

```bash theme={"system"}
inf eval rubrics
```

**Alias:** `inf eval defs`

### Options

| Flag                 | Required | Description              | Default |
| -------------------- | -------- | ------------------------ | ------- |
| `--include-archived` | No       | Include archived rubrics | Off     |

Shows the rubric ID (8-char prefix), name, latest version, total version count, and creation date. Use `--json` for full UUIDs.

## `inf eval run`

Launch a new eval run group against one or more models, scored by a judge model.

<Metadata text="cli/eval/run" />

```bash theme={"system"}
inf eval run \
  --rubric-id <id> \
  --dataset-id <id> \
  --models <route-id-csv> \
  --judge-model <route-id>
```

### Options

| Flag                       | Required | Description                                                                | Default        |
| -------------------------- | -------- | -------------------------------------------------------------------------- | -------------- |
| `--rubric-id <id>`         | Yes      | Rubric ID                                                                  | —              |
| `--dataset-id <id>`        | Yes      | Eval-type dataset ID (create one with `inf dataset create -t eval`)        | —              |
| `--models <ids>`           | Yes      | Comma-separated model route IDs — run `inf models list` to discover them   | —              |
| `--judge-model <id>`       | Yes      | Route ID of the judge model — run `inf models list --judge-only` to filter | —              |
| `--rubric-version-id <id>` | No       | Pin to a specific rubric version                                           | Latest version |
| `--sample-size <n>`        | No       | Samples drawn from the dataset per model (1–100)                           | `100`          |
| `-n, --name <name>`        | No       | Display name for the run group                                             | Auto-generated |

Prints the run group ID and an `inf eval get <id>` follow-up command to track progress.

### Examples

<Metadata text="cli/eval/run-examples" />

```bash theme={"system"}
# Launch a run against two models with a third as judge
inf eval run \
  --rubric-id rub_abc12 \
  --dataset-id ds_def78 \
  --models openai:gpt-5.2,anthropic:claude-sonnet-4-6 \
  --judge-model anthropic:claude-sonnet-4-6

# Pin to a specific rubric version
inf eval run \
  --rubric-id rub_abc12 \
  --rubric-version-id rv_xyz45 \
  --dataset-id ds_def78 \
  --models openai:gpt-5.2 \
  --judge-model anthropic:claude-sonnet-4-6
```

## `inf eval list`

List eval run groups for a given rubric.

<Metadata text="cli/eval/list" />

```bash theme={"system"}
inf eval list --rubric-id <id>
```

**Alias:** `inf eval ls`

### Options

| Flag                       | Required | Description                         | Default |
| -------------------------- | -------- | ----------------------------------- | ------- |
| `--rubric-id <id>`         | Yes      | Rubric ID to list runs for          | —       |
| `--rubric-version-id <id>` | No       | Filter by a specific rubric version | —       |

Shows the run group ID (8-char prefix), rubric version, model count, derived status (`pending`, `running`, `failed`, or `completed`), and creation date.

## `inf eval get`

View detailed information about a specific eval run group.

<Metadata text="cli/eval/get" />

```bash theme={"system"}
inf eval get <id>
```

### Arguments

| Argument | Required | Description           |
| -------- | -------- | --------------------- |
| `id`     | Yes      | The eval run group ID |

### Output

The detail view covers the run group itself, followed by a sub-table of individual runs:

| Field                            | Description                                                                                                                                                                                                             |
| -------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `id`                             | Run group ID                                                                                                                                                                                                            |
| `rubricId` / `rubricVersionId`   | Rubric and pinned version                                                                                                                                                                                               |
| `evalDatasetId`                  | Dataset the run group scored                                                                                                                                                                                            |
| `judgeProvider` / `judgeModelId` | Judge model scoring the responses                                                                                                                                                                                       |
| `models`                         | How many models were evaluated in this run group                                                                                                                                                                        |
| `created`                        | Run group creation timestamp                                                                                                                                                                                            |
| Runs sub-table                   | One row per model: run ID, provider, model, status, average score, failed sample count, `completed/total` samples. When avg score is `—`, the adjacent `N failed` hint shows how many samples the judge couldn't score. |

## `inf eval datasets`

List datasets available for evaluations (type = `eval`).

<Metadata text="cli/eval/datasets" />

```bash theme={"system"}
inf eval datasets
```

### Options

| Flag                 | Required | Description               | Default |
| -------------------- | -------- | ------------------------- | ------- |
| `-l, --limit <n>`    | No       | Maximum number of results | `50`    |
| `--include-archived` | No       | Include archived datasets | Off     |

Eval datasets are materialized via [`inf dataset create -t eval …`](/cli/datasets#inf-dataset-create) or the dashboard. The output shows the dataset ID (8-char prefix), name, inference count, and creation date.
