Three ways to create a rubric
Generate from data
Point the generator at an existing dataset. It analyzes your inputs and outputs and suggests rubric dimensions relevant to your data.
Start from a template
Pick from pre-built rubrics for common quality dimensions like accuracy, helpfulness, tone, or format compliance. Customize from there.
Write your own
Describe the quality dimension in plain English, define what each score level means, and set the scoring range.
π TODO:MEDIA
Screenshot of the rubric creation UI showing the three creation paths.
Template variables
Rubrics use three template variables that inject context from your data into the prompt sent to the judge:| Variable | What it contains | Required |
|---|---|---|
{{ conversation_context }} | The input messages and conversation history | Recommended |
{{ conversation_response }} | The reference/original response from the dataset | Recommended |
{{ eval_model_response }} | The output being scored | Required |
{{ eval_model_response }}. Using all three gives the judge the full picture: the input, what was originally produced, and the output it needs to score.
π TODO:MEDIA
Screenshot of the rubric editor showing template variables in use.
Scoring range
You set the max score when creating a rubric. The default range is 1-10, which gives the judge enough room to distinguish meaningful quality differences. You can adjust this to fit your use case. A smaller range like 1-3 works for simpler pass/fail dimensions, while the default 1-10 is a good fit for most evaluations.Writing effective rubrics
Task-specific rubrics produce sharper, more useful results than broad ones.| Vague | Specific |
|---|---|
| βIs this response accurate?" | "Does the extracted JSON contain all required fields with correct types?" |
| "Is the tone appropriate?" | "Does the response match the brand voice: professional, concise, no hedging?" |
| "Is this helpful?" | "Does the summary capture the three most important points from the source document?β |