Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.inference.net/llms.txt

Use this file to discover all available pages before exploring further.

Once traces are flowing into Catalyst, the next step is figuring out what they’re telling you. The dashboard gives you two places to look: the Traces tab for browsing every span across your account, and the Agents tab for a per-agent view with overview, sessions, traces, and analysis. Halo is our open-source agent-loop optimizer, hosted right inside the Agents dashboard, that reads your traces and writes up what to improve. This guide assumes you’ve already captured your first trace. If not, start there.

Two places to look

Traces tab

Everything you’ve captured, across every service and agent. Filter by service, agent, time range, status, model, token usage, latency, errors, and custom span attributes. Open any trace to walk the tree.

Agents tab

A per-agent workspace. Pick an agent and you get four sub-tabs: Overview, Sessions, Traces, and Analysis.

Inside the Agents tab

Click into any agent and you’ll see four sub-tabs scoped to that agent.
Sub-tabWhat it shows
OverviewHigh-level metrics for the agent. Run counts, error rate, latency, token usage, and cost over time.
SessionsOne row per agent session. Click a session to see the full conversation, every tool call, and every span in order. This is where you go to understand “what happened in this one run.”
TracesThe same trace data as the global Traces tab, pre-filtered to this agent. Filter further by status, time range, model, or any span attribute.
AnalysisThe Halo workspace. Run Halo on the agent’s traces, read past reports, and configure scheduled runs.
The Agents dashboard groups runs by agent.id and agent.name from your spans. If your agent runs aren’t grouping the way you expect, see Agent identity.

Run Halo on your traces

Halo (Hierarchical Agent Loop Optimization) is an open-source RLM-based engine for analyzing agent traces and finding things to improve. It reads OpenTelemetry-compatible spans, decomposes them to find systemic failure modes across many runs, and writes up concrete fixes with citations back to specific traces. You can run Halo two ways:
  • Self-hosted from the open-source repo. pip install halo-engine, point it at a JSONL trace file, and go.
  • Hosted inside Catalyst. The Agents tab’s Analysis sub-tab runs the same engine against the traces you’ve already collected, with no extra setup, no trace export, and no separate pipeline.
The hosted version is where most teams start. Open an agent, open Analysis, and either run Halo on demand or put it on a schedule.

Run Halo on demand

1

Open the Analysis sub-tab

From the Agents tab, pick the agent you want to investigate and click Analysis.
2

Pick a trace window

Choose the time range Halo should review. Tighter windows give Halo more focused signal. A single problem agent over the last 24 hours beats a firehose of everything from the last month.
3

Write a prompt (or use the default)

The default prompt is the same one we use internally most of the time:
Analyze the traces in this window. Understand the agent's activity
and identify anomalies, errors, inefficiencies, and opportunities
to improve reliability, latency, cost, and tool usage. Highlight
the top recurring failures, notable tool-call patterns, wasted or
redundant work, slow or high-cost paths, and concrete fixes you'd
recommend. Cite specific trace IDs for every key finding.
You can also ask Halo anything specific: “Why is the refund agent timing out on Tuesdays?”, “Which tool calls are returning empty results most often?”, “Find redundant LLM calls in the planning loop.”
4

Read the report

Halo returns a ranked list of findings with evidence pulled directly from your traces. Each finding cites the trace IDs it came from, so you can click straight from a finding into the trace tree that produced it.
5

Apply the changes and re-run

Update prompts, tools, or harness logic based on the findings. Capture a new window of traces, and run Halo again to confirm the issue is gone. This is the HALO loop: trace, analyze, fix, repeat.

Schedule recurring runs

For agents you ship to production, the higher-leverage move is putting Halo on a schedule so it reviews recent traces automatically.
1

Open the schedule settings

From the Analysis sub-tab, open the schedules sheet and create a new schedule.
2

Pick a cadence and window

Hourly, daily, weekly, or monthly. The lookback window pre-fills to match the cadence (a daily schedule defaults to a 24-hour window) but you can override it. The runtime caps any single window at 30 days.
3

Set the prompt

The schedule prompt seeds from the same default shown above. Customize it per schedule when you want a recurring run focused on a specific failure mode.
4

Review reports as they land

Each scheduled run produces a new report in the Analysis history rail. Read the latest one, jump back to past reports to track regressions, and ignore runs where Halo finds nothing actionable.

Inspect traces from the CLI

If you’d rather stay in the terminal, the Inference CLI reads the same trace data the dashboard does.
# Browse recent traces
inf trace list --range 1h

# Open a trace tree or timeline
inf trace get <trace-id> --view tree
inf trace get <trace-id> --view timeline

# Search spans and inspect captured IO
inf span list --trace-id <trace-id> --kind LLM
inf span get <trace-id> <span-id> --view io
See inf trace and inf span for the full reference.

Next steps

Halo on GitHub

The open-source HALO engine, methodology, and benchmarks. MIT licensed.

Set agent identity

Add stable agent IDs so Halo and the Agents dashboard group runs correctly.

Capture more of your stack

Add tracing to your other providers, frameworks, and agent runtimes.

Wrap custom work

Add spans around retrieval, routing, subprocesses, and unsupported SDKs.