Skip to main content
The Metrics Explorer gives you an at-a-glance view of your LLM usage. See trends, spot anomalies, and break down spending by model, task, or provider.

📍 TODO:MEDIA

Screenshot of the Metrics Explorer dashboard showing summary stats and time series charts for a task.

Summary statistics

At the top of the dashboard, you’ll see headline numbers for the selected time range:
  • Total inferences and error count/rate
  • Total cost, broken down by input, cached input, output, and reasoning tokens
  • Total tokens, broken down by input (including cached) and output (including reasoning)
  • Average tokens per request for both input and output
  • Streaming request count

Time series charts

Below the summary, time series charts show how your usage changes over time:
ChartWhat it shows
RequestsTotal request volume over time
Error rateSuccess rate percentage over time
CostCost breakdown over time (input, cached input, output, reasoning)
TokensToken usage breakdown over time
DurationLatency percentiles (p50, p75, p90, p99)
Time to first tokenTTFT percentiles (p50, p75, p90, p99), useful for streaming
Payload sizeAverage request and response sizes
Peak throughputPeak requests per time bucket
Model distributionRequest count broken down by model

Breakdown charts

The dashboard also includes distribution views:
  • Model cost comparison - which models are costing you the most
  • HTTP code breakdown - distribution of status codes
  • Failures breakdown - error types and frequency
  • Request types - streaming vs non-streaming split

Filtering

You can scope the dashboard to a specific slice of traffic using filters:
  • Task - see metrics for a single task (requires task tagging)
  • Model - filter to a specific model
  • Provider - filter by upstream provider
  • Environment - filter by environment tag (production, staging, etc.)
  • Status - filter by HTTP status code or range (2xx, 4xx, 5xx, specific codes)

Time controls

  • Time range picker - select the date range for the dashboard (default: last 24 hours)
  • Granularity selector - control how data points are bucketed (hourly, daily, etc.)

Next steps

Inference Viewer

Drill into individual LLM calls to inspect requests, responses, and errors.

Build a dataset

Turn your traffic into datasets for evals and training.