> ## Documentation Index
> Fetch the complete documentation index at: https://docs.inference.net/llms.txt
> Use this file to discover all available pages before exploring further.

# Run Your First Eval

An eval measures which model is better for your task, and by how much. You define a rubric that describes what "good" looks like, run your data through candidate models, and let an LLM judge score the outputs. This is how you know whether a smaller, cheaper model can replace the one you're using today.

This guide uses the **Customer Support Chatbot** demo project, which comes pre-loaded with a dataset and rubric so you can run an eval immediately — no data required. Once you've seen how it works, you can apply the same process to your own data.

## Start the demo project

If you haven't already, create the demo project:

1. From the dashboard, navigate to the **Learn** page (or the **Create a Project** page).
2. Find **Customer Support Chatbot** and click **Start with demo project**.

<Frame>
  <img src="https://mintcdn.com/kuzco/eBkhV7xueuLgf0f-/images/customer-support/cs-demo-button.png?fit=max&auto=format&n=eBkhV7xueuLgf0f-&q=85&s=5866e7403beb014f92086335528aa1a8" alt="Customer Support Chatbot demo project card with Start with demo project button" width="2986" height="1506" data-path="images/customer-support/cs-demo-button.png" />
</Frame>

This creates a new project in your account pre-loaded with everything you need:

| Artifact         | Name                     | Purpose                                                                                        |
| ---------------- | ------------------------ | ---------------------------------------------------------------------------------------------- |
| Eval dataset     | `customer-support-eval`  | Sample customer support conversations to evaluate against                                      |
| Training dataset | `customer-support-train` | Used later for [training a model](/get-started/train-and-deploy)                               |
| Rubric           | Customer support rubric  | Defines what a good customer support response looks like — tone, format, and accuracy criteria |

## Run an eval

<Steps>
  <Step title="Navigate to Evals">
    Open your **Customer Support Chatbot** project and go to the **Evals** tab. Click **New Eval**.
  </Step>

  <Step title="Select the rubric and dataset">
    The demo project's rubric and the `customer-support-eval` dataset are already available in your project. Select them.

    <Frame>
      <img src="https://mintcdn.com/kuzco/eBkhV7xueuLgf0f-/images/customer-support/cs-eval-form.png?fit=max&auto=format&n=eBkhV7xueuLgf0f-&q=85&s=b3748a09db13415e3ddd68318f7e456e" alt="Eval setup form with rubric and dataset selected" width="1452" height="1492" data-path="images/customer-support/cs-eval-form.png" />
    </Frame>
  </Step>

  <Step title="Pick models to compare">
    Choose two or more models to evaluate. You can pick any combination from the model catalog — OpenAI, Anthropic, open-source, or any other available model. For a quick comparison, try picking a large model and a smaller one to see how they stack up.
  </Step>

  <Step title="Run the eval">
    Click **Run**. Each sample from the dataset is sent to each model, and an LLM judge scores every response against the rubric.
  </Step>

  <Step title="Compare the results">
    When the eval completes, the comparison view shows side-by-side scores across all models and samples. Look at overall scores to see which model wins, and drill into individual samples to understand where models differ.

    <Frame>
      <img src="https://mintcdn.com/kuzco/eBkhV7xueuLgf0f-/images/customer-support/cs-eval-comp.png?fit=max&auto=format&n=eBkhV7xueuLgf0f-&q=85&s=9925d672356c13b037a3ed750492968b" alt="Eval results comparison view showing scores across models" width="2814" height="1698" data-path="images/customer-support/cs-eval-comp.png" />
    </Frame>
  </Step>
</Steps>

## What you just learned

* **Rubrics** define your quality bar in plain English — the LLM judge uses them to score outputs
* **Evals** run your data through multiple models and score the results, giving you a data-driven comparison
* You can re-run evals anytime — after changing the rubric, adding models, or later after [training a custom model](/get-started/train-and-deploy) to see how it compares

## Next steps

<CardGroup cols={2}>
  <Card title="Train a custom model" icon="brain" href="/get-started/train-and-deploy">
    Use the same demo project to train and deploy a model.
  </Card>

  <Card title="Write a rubric" icon="pen" href="/platform/eval/write-a-rubric">
    Learn how to write your own rubrics for your specific use case.
  </Card>

  <Card title="Read the results" icon="chart-column" href="/platform/eval/read-the-results">
    Deep dive on interpreting the comparison view.
  </Card>

  <Card title="Build a dataset" icon="database" href="/platform/datasets/overview">
    Create datasets from your own data — captured traffic or uploaded files.
  </Card>
</CardGroup>
