Production Agent Example

This example walks through a realistic agent loop end to end: a long-lived server that handles incoming messages, runs an LLM-driven agent that calls several custom tools, and exports a clean OpenInference trace tree to Catalyst. Every piece in this guide maps to a real pattern used by agents in production today. By the end you will have:

A boot-time setup() that runs once per process.
A request handler that wraps the whole agent run in an AGENT span with stable agent.id and per-conversation session.id.
Custom tool execution wrapped in TOOL spans with tool.name and tool_call.id.
Auto-emitted LLM child spans from the patched Anthropic SDK, nested under the agent span by OTel context propagation.
Domain-specific attributes (tenant, channel, viewer role) on the agent span for filtering in the dashboard.
A graceful shutdown that flushes batched spans on SIGTERM.

This example hand-rolls the agent loop with the raw Anthropic SDK, so you author the TOOL spans yourself (Step 3). The patched SDK captures the LLM calls automatically, but your tools run in your own code, so nothing emits a tool span unless you do.If you use a framework that runs the tools for you (OpenAI Agents, LangGraph), the framework emits the TOOL spans and you skip Step 3. There you would only add manual spans for steps the framework never sees, like a retrieval inside a tool. See OpenAI Agents for that path.

Step 1 — Bootstrap Tracing Once

Tracing should initialize once per process, not per request. For a long-lived server, that means a memoized setup() call that any code path can await.

TypeScript

// tracing.ts
import Anthropic from "@anthropic-ai/sdk";
import { setup, type CatalystTracing } from "@inference/tracing";

let tracingPromise: Promise<CatalystTracing> | null = null;

export function initTracing(): Promise<CatalystTracing> {
  if (!tracingPromise) {
    tracingPromise = setup({
      serviceName: process.env.SERVICE_NAME ?? "customer-support",
      serviceVersion: process.env.SERVICE_VERSION,
      endpoint: process.env.CATALYST_OTLP_ENDPOINT,
      token: process.env.CATALYST_OTLP_TOKEN,
      modules: { anthropic: Anthropic },
    });
  }
  return tracingPromise;
}

export async function shutdownTracing(): Promise<void> {
  if (!tracingPromise) return;
  const tracing = await tracingPromise;
  await tracing.shutdown();
}

TypeScript (server entrypoint)

// server.ts
import { initTracing, shutdownTracing } from "./tracing.ts";

await initTracing(); // patches Anthropic before any client is constructed
const server = startServer();

for (const signal of ["SIGTERM", "SIGINT"] as const) {
  process.on(signal, async () => {
    await shutdownTracing();
    server.close(() => process.exit(0));
  });
}

Two things to notice:

initTracing() runs before the first Anthropic client is constructed. The per-SDK patchers work by mutating the SDK’s prototype, so setup() has to win the race.
shutdown() runs on SIGTERM, not per request. Spans are batched and exported in the background; calling shutdown() per request would force synchronous flushes and add latency.

Step 2 — Define The Request Boundary

Each incoming message becomes one trace, rooted at one AGENT span. The agent span carries the identifiers Catalyst uses for grouping in the Agents dashboard.

TypeScript

import { agentSpan } from "@inference/tracing";
import { initTracing } from "./tracing.ts";
import { runAgent } from "./agent.ts";

export interface IncomingMessage {
  conversationId: string;
  text: string;
  channel: "slack" | "email" | "web";
  tenantId: string;
  viewer: { id: string; role: "admin" | "member" };
}

export async function handleMessage(msg: IncomingMessage): Promise<string> {
  const tracing = await initTracing();

  return await agentSpan(
    {
      agentId: "customer-support-prod",
      agentName: "Customer Support Agent",
      role: "support",
      system: "anthropic",
      sessionId: msg.conversationId,
      spanName: "customer-support.run",
    },
    async (span) => {
      // Domain attributes for dashboard filtering.
      span.raw.setAttribute("app.tenant_id", msg.tenantId);
      span.raw.setAttribute("app.channel", msg.channel);
      span.raw.setAttribute("app.viewer.id", msg.viewer.id);
      span.raw.setAttribute("app.viewer.role", msg.viewer.role);

      span.setInput(msg.text);
      const response = await runAgent(msg);
      span.setOutput(response);
      return response;
    },
  );
}

The four app.* attributes are outside the OpenInference vocabulary. They go on the raw OTel span and become filter facets in the dashboard. Use the same naming convention (a stable prefix for your app, dot-separated keys) so you can find them easily under inf trace list --metadata "app.channel=slack".

Step 3 — Author Tool Spans Around Each Tool Call

When the LLM emits a tool_use block, your code runs the actual tool function. Wrap that execution in a TOOL span so the trace tree shows what the tool received, what it returned, and how long it took.

TypeScript

// tools.ts
import { manualSpan, SpanKindValues } from "@inference/tracing";
import { initTracing } from "./tracing.ts";

export type ToolName = "lookup_order" | "issue_refund" | "send_email";
export type ToolArgs = Record<string, unknown>;
export type ToolResult = Record<string, unknown>;

const TOOL_IMPLS: Record<ToolName, (args: ToolArgs) => Promise<ToolResult>> = {
  lookup_order: async ({ orderId }) => ({ orderId, status: "shipped" }),
  issue_refund: async ({ orderId, amount }) => ({
    ok: true,
    orderId,
    amount,
    refundId: "RFD-" + Math.floor(Math.random() * 9999),
  }),
  send_email: async ({ to, subject }) => ({ ok: true, to, subject }),
};

export async function executeTool(
  name: ToolName,
  args: ToolArgs,
  toolCallId: string,
): Promise<ToolResult> {
  const tracing = await initTracing();

  return await manualSpan(
    {
      spanName: `${name}.tool`,
      spanKind: SpanKindValues.TOOL,
      toolName: name,
      toolCallId,
      input: args,
    },
    async (span) => {
      const result = await TOOL_IMPLS[name](args);
      span.setOutput(result);
      return result;
    },
  );
}

manualSpan writes openinference.span.kind=TOOL, tool.name, tool_call.id, input.value, and input.mime_type from the options. The callback only needs to set the output. Span end, status, and exception recording are all handled — if the tool throws, the exception is recorded on the span, the span ends with ERROR, and the original exception re-throws so the agent loop can see it. Because executeTool runs inside the active context established by agentSpan upstream, the TOOL span automatically parents under the agent span. No span IDs need to be threaded through.

If your tool needs behavior manualSpan does not provide — for instance, recording a span event mid-callback while keeping the span alive past the callback return — drop down to tracing.tracer.startActiveSpan and manage status / span.end() yourself. See Manual spans → Escape hatch.

Step 4 — Wire The Agent Loop

The agent loop alternates between calling the LLM and executing tool calls the LLM requests. Both sides are now instrumented.

TypeScript

// agent.ts
import Anthropic from "@anthropic-ai/sdk";
import { executeTool, type ToolName } from "./tools.ts";
import type { IncomingMessage } from "./handler.ts";

const client = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });

const TOOLS: Anthropic.Tool[] = [
  {
    name: "lookup_order",
    description: "Look up an order by ID.",
    input_schema: {
      type: "object",
      properties: { orderId: { type: "string" } },
      required: ["orderId"],
    },
  },
  {
    name: "issue_refund",
    description: "Issue a refund.",
    input_schema: {
      type: "object",
      properties: {
        orderId: { type: "string" },
        amount: { type: "number" },
      },
      required: ["orderId", "amount"],
    },
  },
];

export async function runAgent(msg: IncomingMessage): Promise<string> {
  const messages: Anthropic.MessageParam[] = [
    { role: "user", content: msg.text },
  ];

  for (let turn = 0; turn < 8; turn++) {
    // The patched Anthropic SDK emits an LLM span automatically, parented
    // under the active agent span via OTel context propagation.
    const response = await client.messages.create({
      model: "claude-haiku-4-5",
      max_tokens: 1024,
      tools: TOOLS,
      messages,
    });

    if (response.stop_reason === "end_turn") {
      return textOf(response.content);
    }

    if (response.stop_reason !== "tool_use") {
      return textOf(response.content);
    }

    messages.push({ role: "assistant", content: response.content });

    const toolResults: Anthropic.ToolResultBlockParam[] = [];
    for (const block of response.content) {
      if (block.type !== "tool_use") continue;
      const result = await executeTool(
        block.name as ToolName,
        block.input as Record<string, unknown>,
        block.id,
      );
      toolResults.push({
        type: "tool_result",
        tool_use_id: block.id,
        content: JSON.stringify(result),
      });
    }

    messages.push({ role: "user", content: toolResults });
  }

  return "Max turns reached.";
}

function textOf(content: Anthropic.ContentBlock[]): string {
  return content
    .filter((b): b is Anthropic.TextBlock => b.type === "text")
    .map((b) => b.text)
    .join("");
}

Three observations:

No tracing imports in the inner loop. The agent code looks the same as it would without tracing. The instrumentation is at the boundaries (setup(), agentSpan(), executeTool()).
The patched Anthropic SDK does the LLM-span work. We pass modules: { anthropic: Anthropic } to setup(), and from then on every client.messages.create() call emits an LLM span with input messages, output content blocks, model, finish reason, and token usage.
Tool spans are caller-side. They wrap the real function execution, not the message round-trip. The model-side view of the tool call is captured on the parent LLM span automatically; the caller-side view is the TOOL span we author.

Step 5 — Verify In The Dashboard And CLI

Send a request through the server, then check the resulting trace:

# Find the most recent trace from this service
inf trace list --service customer-support --limit 1

# Open its span tree
inf trace get <trace-id> --view tree

# Inspect a TOOL span's input and output
inf span list --trace-id <trace-id> --kind TOOL
inf span get <trace-id> <span-id> --view io

# Filter on a domain attribute
inf trace list --metadata "app.channel=slack" --range 1h

The trace tree should show a single AGENT root with the LLM and TOOL spans nested beneath it.

Common Variations

Multi-Tenant Service With Per-Request Identity

If agent.id itself depends on the request (for example, a multi-tenant service that runs different agent personas per customer), compute it in the handler:

TypeScript

const agentId = `support-${msg.tenantId}-prod`;

await agentSpan(
  {
    agentId,
    agentName: `${tenantConfig.displayName} Support`,
    role: "support",
    sessionId: msg.conversationId,
    spanName: "customer-support.run",
  },
  async (span) => { /* ... */ },
);

Stable IDs matter more than human-friendly ones. Prefer support-acme-prod over support-acme-2024-v2 — the Agents dashboard uses the ID to group runs across deploys.

Background Jobs Triggered From The Agent

If your tool launches a background job that itself does LLM work, capture the active identity and pass it into the job so the background span can be filtered together with its originating conversation:

TypeScript

import { getActiveAgentIdentity } from "@inference/tracing";

async function executeTool_enqueueReport(args: ToolArgs): Promise<ToolResult> {
  const identity = getActiveAgentIdentity();
  await jobQueue.enqueue("generate-report", {
    ...args,
    contextAgentId: identity?.id,
    contextSessionId: identity?.id ? identity.id : undefined,
  });
  return { ok: true };
}

The background worker can then set agent.id and session.id on its own agent span so the two pieces of work share dashboard grouping.

Streaming Responses

When the agent streams output back to the user, set the span output once at the end, after the stream completes. The patched SDK already handles streaming LLM calls correctly; the outer agent span just needs the final text:

TypeScript

await agentSpan(options, async (span) => {
  span.setInput(msg.text);
  let final = "";
  for await (const chunk of streamAgent(msg)) {
    final += chunk;
    yield chunk; // back to the caller
  }
  span.setOutput(final);
});

Custom Span Events

For mid-callback events that are not span attributes — a rate-limit retry, a fallback to a smaller model, a cache miss — use span.raw.addEvent:

TypeScript

span.raw.addEvent("rate_limit_retry", {
  attempt: 2,
  retry_after_ms: 1500,
});

Events appear under the --view events flag of inf span get and on the span detail page.

What To Test

Behavior	How to verify
`setup()` runs before the first SDK call	Search server logs for the Catalyst tracing init message; confirm it precedes any Anthropic request log.
LLM spans parent under the agent span	`inf trace get <id> --view tree` shows a single AGENT root with LLM and TOOL children.
Tool span has `tool.name` and `tool_call.id`	`inf span get <id> --view attributes`
Errors mark the span `ERROR`	Force a tool to throw; confirm the span status is `ERROR` and the trace status is `ERROR`.
Spans flush on `SIGTERM`	Send `SIGTERM` to the server right after a request; the trace should still appear in Catalyst.
Domain attributes are filterable	`inf trace list --metadata "app.tenant_id=acme"` returns the expected traces.

Next Steps

Manual spans

The full surface for AGENT, TOOL, CHAIN, and RETRIEVER spans.

Attributes reference

All Attr.* constants and SpanKindValues with the attributes each kind expects.

Handle API reference

Every method on the span handle and how it coerces values.

Troubleshooting

Debug missing spans, missing attributes, and shutdown behavior.

Integrations

Traces

Gateway

Step 1 — Bootstrap Tracing Once

Step 2 — Define The Request Boundary

Step 3 — Author Tool Spans Around Each Tool Call

Step 4 — Wire The Agent Loop

Step 5 — Verify In The Dashboard And CLI

Common Variations

Multi-Tenant Service With Per-Request Identity

Background Jobs Triggered From The Agent

Streaming Responses

Custom Span Events

What To Test

Next Steps

Manual spans

Attributes reference

Handle API reference

Troubleshooting

​Step 1 — Bootstrap Tracing Once

​Step 2 — Define The Request Boundary

​Step 3 — Author Tool Spans Around Each Tool Call

​Step 4 — Wire The Agent Loop

​Step 5 — Verify In The Dashboard And CLI

​Common Variations

​Multi-Tenant Service With Per-Request Identity

​Background Jobs Triggered From The Agent

​Streaming Responses

​Custom Span Events

​What To Test

​Next Steps

Manual spans

Attributes reference

Handle API reference

Troubleshooting

Step 1 — Bootstrap Tracing Once

Step 2 — Define The Request Boundary

Step 3 — Author Tool Spans Around Each Tool Call

Step 4 — Wire The Agent Loop

Step 5 — Verify In The Dashboard And CLI

Common Variations

Multi-Tenant Service With Per-Request Identity

Background Jobs Triggered From The Agent

Streaming Responses

Custom Span Events

What To Test

Next Steps