> ## Documentation Index
> Fetch the complete documentation index at: https://docs.inference.net/llms.txt
> Use this file to discover all available pages before exploring further.

# Signals

> Define plain-language signals that automatically classify your agent's traffic (sentiment, jailbreak attempts, NSFW content, task outcomes, anything you can describe) and alert on them.

Once traces are flowing, signals turn them into structured labels. A signal is a plain-language classifier you define once and Catalyst runs continuously against one of your agents. Describe what you want to detect ("is this NSFW?", "did the user get frustrated?", "was the task completed?") and every matching trace gets labeled automatically by an LLM judge, with the results queryable right alongside your other agent metrics.

Signals are how you evaluate the things default metrics miss. For a user-facing agent they tell you how people are actually interacting with it (frustration, sentiment, jailbreak attempts). For a non-user-facing agent they tell you how the agent itself is behaving (did it refuse, did it complete the task, did it stay on policy).

<Tip>
  Want the full walkthrough with a worked example instead of the reference? See the guide [Measure your agent's quality with Signals](/guides/measure-agent-quality).
</Tip>

## Before you start

Signals run on captured traffic, so you need traces flowing first. This guide assumes you've already [captured your first trace](/get-started/capture-first-trace). You also need:

* **An agent with traces installed.** Signals are always scoped to a single agent, so there is nothing to label until that agent is emitting traces.
* **A stable, consistent `agentId`.** Signals are per agent, so your traces need a consistent agent identity to group by. Set the `agentId` once and keep it the same across runs. If you instrumented with the CLI this is usually already set. See [Set agent identity](/integrations/traces/agent-identity).
* **A `sessionId`, if you want session-scoped signals.** Session signals classify a whole conversation, which requires that your traces carry a `sessionId` (your conversation or chat ID) so Catalyst can assemble the conversation. See [Choose a scope](#choose-a-scope).

## Where to find signals

You can get to signals two ways in the dashboard:

* The **Signals** tab, which lists every signal and is the main table view across your agents.
* The **Agents** tab, where you pick an agent and open its **Signals** view to see and create signals scoped to just that agent.

## How signals work

A signal evaluates traffic for one of your agents. When you activate it, Catalyst samples incoming traffic for that agent, sends each sampled target's input and output to an LLM judge with the prompt you wrote, and writes the label back. You can then filter, chart, and break down your traces by that label.

<Steps>
  <Step title="You define the signal">
    Pick a scope and classifier type, write a prompt describing what to look for, and set a sample rate.
  </Step>

  <Step title="Catalyst samples matching traffic">
    For an active signal, a deterministic share of the agent's traffic (the sample rate) is selected for classification. Sampling is deterministic, so the same target always resolves the same way.
  </Step>

  <Step title="A judge labels each target">
    The judge reads the input and output of each sampled span, trace, or session and returns a label that conforms to your classifier type.
  </Step>

  <Step title="Labels land on your traces">
    Results are stored against each labeled target and surfaced in the dashboard, where you can filter by label value and watch the label distribution over time.
  </Step>
</Steps>

## Choose a scope

A signal's **scope** is the unit it labels. You pick it when you create the signal, and it's fixed for the life of the signal (everything else is editable). Scope determines how much context the judge sees on each call.

<CardGroup cols={3}>
  <Card title="Span" icon="dot">
    A single model call. The narrowest scope. Useful for narrow, call-level checks, but often too granular if you care about the interaction as a whole.
  </Card>

  <Card title="Trace" icon="route">
    One turn, or request. The judge sees the whole turn rather than a single call. A good fit for request-level outcomes.
  </Card>

  <Card title="Session" icon="comments">
    The full conversation. Usually the most useful scope for understanding a user, since the judge sees the entire back-and-forth. Requires a `sessionId` on your traces so the conversation can be assembled.
  </Card>
</CardGroup>

<Note>
  Session scope only works if your traces carry a `sessionId`. If you haven't set one, add it before creating a session-scoped signal. See [Set agent identity](/integrations/traces/agent-identity).
</Note>

## Two classifier types

When you create a signal you choose how it labels each target.

<CardGroup cols={2}>
  <Card title="Binary (yes / no)" icon="toggle-on">
    A true/false classifier. Use it for "is this X or not" questions: NSFW content, jailbreak attempts, refusals, or any flag you want to filter on. No labels to configure, just a prompt.
  </Card>

  <Card title="String (enumerated labels)" icon="tags">
    A classifier that returns one of a fixed set of labels you define. Use it when there's more than two outcomes: sentiment (positive / neutral / negative), task outcome (completed / partial / failed / abandoned), and so on. Define between 2 and 10 labels.
  </Card>
</CardGroup>

## Create a signal

Create a signal from the **Signals** tab, or from a specific agent's **Signals** view under the **Agents** tab. Either way the signal is scoped to one agent.

<Steps>
  <Step title="Open the signal editor for your agent">
    From the Signals tab or an agent's Signals view, pick the agent whose traffic you want to label and create a new signal.
  </Step>

  <Step title="Name the signal">
    Give it a short, descriptive name like `NSFW` or `User frustration`. The name is how the label shows up everywhere else in the dashboard.
  </Step>

  <Step title="Choose a scope">
    Pick **Span**, **Trace**, or **Session** (see [Choose a scope](#choose-a-scope)). This is the one setting you can't change later, so pick the unit you actually want to reason about. Session is usually the most useful for understanding a user.
  </Step>

  <Step title="Choose a classifier type">
    Pick **Binary (yes / no)** or **String (enumerated labels)**. For a string classifier, add the 2 to 10 labels the judge is allowed to return.
  </Step>

  <Step title="Write the prompt">
    Describe what the signal should classify. This is the instruction the judge follows on every target, so be specific about what counts as each outcome. For a binary signal, describe what makes a target a "yes." For a string signal, describe when to pick each label.
  </Step>

  <Step title="Set the sample rate">
    The sample rate is the share of matching traffic that gets classified, and lower rates cost less. 100% runs on every target, which is often unnecessary at high volume, so to spend fewer credits pick a lower rate like 25% or 50% and the signal only runs on that share of incoming traffic. Start lower on high-volume agents and raise it once you trust the labels. Common presets are 10%, 25%, 50%, and 100%.
  </Step>

  <Step title="Save as a draft or activate">
    **Save draft** keeps the signal unpublished so you can keep tuning it. **Activate** publishes it and starts live classification on new traffic.
  </Step>
</Steps>

<Tip>
  Don't want to start from scratch? Use **Start from a template** to prefill the classifier type, prompt, and labels for a common signal, then edit from there. See [Templates](#templates) below.
</Tip>

## Test before you activate

Before you commit a signal to live traffic, run it against recent traffic to preview how it labels it. A test run classifies a small sample synchronously and shows you the label distribution and per-target results, and **nothing is saved**. It's a preview only and doesn't affect your signal or store any labels.

<Steps>
  <Step title="Open the tester">
    From the signal, choose **Test it**.
  </Step>

  <Step title="Pick a sample size and time range">
    Choose how many recent targets to classify (1 to 100) and the window to pull them from.
  </Step>

  <Step title="Read the preview">
    You'll see the label distribution across the sample plus a per-target breakdown, including which targets got flagged. If the labels don't match your intent, adjust the prompt or labels and test again.
  </Step>
</Steps>

<Note>
  Test results are not persisted. They exist only to help you tune the prompt before activating.
</Note>

## Activate and run live

Activating a signal starts live classification: as new traffic arrives for the agent, the configured share of it gets sampled and labeled automatically. You don't have to do anything else, and labels accumulate as traffic flows.

A signal is always in one of three states:

| State        | What it means                                                      |
| ------------ | ------------------------------------------------------------------ |
| **Draft**    | Saved but not running. Nothing is being classified.                |
| **Active**   | Live. New matching traffic is sampled and labeled.                 |
| **Disabled** | Paused. Live classification has stopped, but past labels are kept. |

You can disable an active signal at any time to stop classification without losing the labels you've already collected, and re-enable it later.

## Backfill historical data

Live classification only labels traffic that arrives after you activate. To label traffic you already captured, run a **manual run** (backfill) at any time. Unlike a test, a manual run **saves** its labels: they're stored against your traces and tied to the run, exactly like live labels.

<Steps>
  <Step title="Open the manual run dialog">
    From the signal, choose **Manual run / Backfill**.
  </Step>

  <Step title="Pick a time range and sample rate">
    Choose the historical window to apply the signal to, and the share of matching traffic in that window to classify.
  </Step>

  <Step title="Start the run">
    The run executes in the background, classifying past traffic across the window. Results land in the same place as live labels as the run progresses.
  </Step>
</Steps>

<Tip>
  Backfill a representative window first to sanity-check the labels at scale before running it over a long history. A manual run classifies real traffic and counts toward usage.
</Tip>

## Read the results

There are a few places to read what a signal found:

* **The Signals tab** is the main table across your agents, with each signal's current state and headline numbers at a glance.
* **The agent's Overview** under the Agents tab has per-signal graphs: trends and volume over time, plus a range of metrics for each signal so you can see how a label is moving.
* **The signal detail view** has a table of every classified target (the spans, traces, or sessions the signal labeled). Click into any row to open the underlying trace, span, or session and read the actual conversation that produced the label.

Labeled targets render their label as a colored chip. For a binary signal, "yes" and "no" get distinct colors; for a string signal, each label gets its own color. From there you can:

* **Filter by label value** to pull up just the targets a signal flagged (for example, every session labeled "yes" by a jailbreak signal).
* **Watch the distribution over time** to see how a label trends across hours or days.
* **Jump straight to the underlying trace, span, or session** to see the full context and conversation behind any labeled target.

## Alert on a signal

Once a signal is running, set up alerts so you hear about changes without watching the dashboard. Alerts are configured per signal, and you can get notified through the Slack integration or by email.

An alert fires when a metric crosses a condition over a window. You choose:

* **The metric.** What to watch, depending on the classifier type:
  * **Label volume** (any signal): how many labels came in.
  * **True rate** (binary signals): the share of labels that are "yes."
  * **Value count** (string signals): how many labels landed on one specific value.
  * **Value share** (string signals): that value's share of all labels.
* **The comparison.** Either a **percentage change** versus the prior equal-length window (for example, "this value's count is up 10% in the last 24 hours") or an **absolute threshold** (for example, "true rate is below 80%").
* **The window** the comparison runs over, from 5 minutes up to 48 hours.
* **A minimum label count** so quiet periods don't trip the alert on statistically meaningless deltas.
* **A cooldown** so a flapping condition doesn't bombard you. After a firing, the next one is suppressed until the cooldown elapses.

You can **backtest** an alert against recent history to see when it would have fired before you turn it on, and pause or re-enable any alert at any time.

<Tip>
  Start with a wide window, a sensible minimum label count, and a cooldown. Tighten the threshold once you've seen how the metric actually moves in the backtest.
</Tip>

## Versioning and editing

Signals are versioned. Editing a signal (changing its prompt, labels, classifier type, or sample rate) creates a new version rather than mutating the old one, and the dashboard shows each version after every edit. The current version is the one powering live classification, and labels record which version produced them, so you can change a signal's definition without losing the history of what earlier versions decided.

**Scope is the exception: it's fixed once a signal is created.** Everything else is editable, but to label a different unit you create a new signal.

When you no longer need a signal, archive it. Archiving stops it and removes it from your active list while preserving the labels it produced.

## Templates

To get started quickly, create a signal **from a template** and edit from there. Built-in templates include:

| Template               | Type   | What it flags                                                                                               |
| ---------------------- | ------ | ----------------------------------------------------------------------------------------------------------- |
| **NSFW**               | Binary | Spans whose content is sexually explicit, graphic, or otherwise not safe for work.                          |
| **Jailbreak attempt**  | Binary | Spans where the user tries to bypass the model's safety guardrails or system instructions.                  |
| **Laziness / refusal** | Binary | Spans where the assistant refuses, stalls, or gives a low-effort non-answer instead of completing the task. |
| **User frustration**   | Binary | Spans where the user expresses frustration, annoyance, or dissatisfaction with the assistant.               |
| **Sentiment**          | String | The overall sentiment the user expresses: positive, neutral, or negative.                                   |
| **Task outcome**       | String | Whether the task the user asked for was completed: completed, partial, failed, or abandoned.                |

Templates are just a starting point. You can write any prompt you want and build a signal from scratch, with either classifier type.

## Manage signals from the CLI and MCP

The dashboard isn't the only way in. Everything above is also available through the [Inference CLI](/cli/overview) and the [MCP server](/integrations/mcp-server), so you can create, run, and read signals from your terminal or straight from an AI coding assistant.

* **CLI.** The `inf signals` command group lists, creates, edits, activates, disables, and archives signals, tests them, kicks off manual runs, and reads labels and distributions.
* **MCP.** The Inference MCP server exposes the same operations as tools (creating signals, activating and disabling them, running backfills, configuring alerts, and querying labels and distributions), so an assistant can set up and inspect signals on your behalf.

## Feed signals into Halo

Signals pair naturally with [Halo](/get-started/analyze-traces), our agent-loop optimizer. The targets a signal flags are exactly the traffic worth digging into:

* **Improve a behavior.** Point Halo at the spans, traces, or sessions a signal flagged and ask how to fix what they have in common (the refusals, the frustrated sessions, the failed tasks).
* **Decide what to measure.** Talk to Halo about your traces to surface which signals would be most valuable to add in the first place.

## Next steps

<CardGroup cols={2}>
  <Card title="Analyze your traces" icon="microscope" href="/get-started/analyze-traces">
    Run Halo on your traces to find systemic failure modes and concrete fixes.
  </Card>

  <Card title="Set agent identity" icon="fingerprint" href="/integrations/traces/agent-identity">
    Add stable agent IDs so signals attach to the right agent and group cleanly.
  </Card>

  <Card title="Capture more of your stack" icon="puzzle-piece" href="/integrations/traces/overview">
    Add tracing to more providers, frameworks, and agent runtimes so signals have more to label.
  </Card>

  <Card title="Wrap custom work" icon="pen-nib" href="/integrations/traces/manual-spans">
    Add spans around retrieval, routing, and subprocesses so signals can classify them too.
  </Card>
</CardGroup>
