> ## Documentation Index
> Fetch the complete documentation index at: https://docs.inference.net/llms.txt
> Use this file to discover all available pages before exploring further.

# Run a Model Comparison

> Run your eval dataset through multiple models and score the outputs against your rubric.

Pick a rubric, pick a dataset, pick models, and run. Catalyst handles execution and scoring.

## Step by step

<Steps>
  <Step title="Select a rubric">
    Choose the [rubric](/platform/eval/write-a-rubric) that defines your quality criteria.
  </Step>

  <Step title="Select an eval dataset">
    Choose the dataset containing your evaluation samples. This can come from [captured traffic](/platform/datasets/build-from-traffic) or a [JSONL upload](/platform/datasets/upload-a-dataset).
  </Step>

  <Step title="Select models">
    Pick one or more models to evaluate. You can choose from a wide range of models including OpenAI, Anthropic, open-source, or your own custom trained models.
  </Step>

  <Step title="Run the eval">
    Each sample from the dataset runs through each selected model. Each output gets scored by the [LLM judge](/platform/eval/llm-as-a-judge) using your rubric.
  </Step>
</Steps>

<img
  src="https://mintcdn.com/kuzco/lo2UF46ckKcvUyUA/images/eval/run-eval-overview.png?fit=max&auto=format&n=lo2UF46ckKcvUyUA&q=85&s=a2dc2879c07e773e78d46d48aa72fa54"
  alt="Eval setup flow showing rubric, dataset, and model selection in the dashboard."
  style={{
width: "100%",
borderRadius: "0.75rem",
border: "1px solid var(--inference-stroke-soft, #d6cdc4)",
margin: "1.5rem 0",
}}
  width="1414"
  height="1268"
  data-path="images/eval/run-eval-overview.png"
/>

## How the math works

The eval is a cross-product of samples and models:

* 10 samples across 3 models = 30 inference outputs
* Each output gets scored = 30 judge calls
* Results: per-sample scores for every model

## Next steps

Once the eval completes, go to [Read the Results](/platform/eval/read-the-results) to interpret the comparison view.
