Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.inference.net/llms.txt

Use this file to discover all available pages before exploring further.

This example walks through a realistic agent loop end to end: a long-lived server that handles incoming messages, runs an LLM-driven agent that calls several custom tools, and exports a clean OpenInference trace tree to Catalyst. Every piece in this guide maps to a real pattern used by agents in production today. By the end you will have:
  • A boot-time setup() that runs once per process.
  • A request handler that wraps the whole agent run in an AGENT span with stable agent.id and per-conversation session.id.
  • Custom tool execution wrapped in TOOL spans with tool.name and tool_call.id.
  • Auto-emitted LLM child spans from the patched Anthropic SDK, nested under the agent span by OTel context propagation.
  • Domain-specific attributes (tenant, channel, viewer role) on the agent span for filtering in the dashboard.
  • A graceful shutdown that flushes batched spans on SIGTERM.

Step 1 — Bootstrap Tracing Once

Tracing should initialize once per process, not per request. For a long-lived server, that means a memoized setup() call that any code path can await.
TypeScript
// tracing.ts
import Anthropic from "@anthropic-ai/sdk";
import { setup, type CatalystTracing } from "@inference/tracing";

let tracingPromise: Promise<CatalystTracing> | null = null;

export function initTracing(): Promise<CatalystTracing> {
  if (!tracingPromise) {
    tracingPromise = setup({
      serviceName: process.env.SERVICE_NAME ?? "customer-support",
      serviceVersion: process.env.SERVICE_VERSION,
      endpoint: process.env.CATALYST_OTLP_ENDPOINT,
      token: process.env.CATALYST_OTLP_TOKEN,
      modules: { anthropic: Anthropic },
    });
  }
  return tracingPromise;
}

export async function shutdownTracing(): Promise<void> {
  if (!tracingPromise) return;
  const tracing = await tracingPromise;
  await tracing.shutdown();
}
TypeScript (server entrypoint)
// server.ts
import { initTracing, shutdownTracing } from "./tracing.ts";

await initTracing(); // patches Anthropic before any client is constructed
const server = startServer();

for (const signal of ["SIGTERM", "SIGINT"] as const) {
  process.on(signal, async () => {
    await shutdownTracing();
    server.close(() => process.exit(0));
  });
}
Two things to notice:
  1. initTracing() runs before the first Anthropic client is constructed. The per-SDK patchers work by mutating the SDK’s prototype, so setup() has to win the race.
  2. shutdown() runs on SIGTERM, not per request. Spans are batched and exported in the background; calling shutdown() per request would force synchronous flushes and add latency.

Step 2 — Define The Request Boundary

Each incoming message becomes one trace, rooted at one AGENT span. The agent span carries the identifiers Catalyst uses for grouping in the Agents dashboard.
TypeScript
import { agentSpan } from "@inference/tracing";
import { initTracing } from "./tracing.ts";
import { runAgent } from "./agent.ts";

export interface IncomingMessage {
  conversationId: string;
  text: string;
  channel: "slack" | "email" | "web";
  tenantId: string;
  viewer: { id: string; role: "admin" | "member" };
}

export async function handleMessage(msg: IncomingMessage): Promise<string> {
  const tracing = await initTracing();

  return await agentSpan(
    tracing.tracer,
    {
      agentId: "customer-support-prod",
      agentName: "Customer Support Agent",
      role: "support",
      system: "anthropic",
      sessionId: msg.conversationId,
      spanName: "customer-support.run",
    },
    async (span) => {
      // Domain attributes for dashboard filtering.
      span.raw.setAttribute("app.tenant_id", msg.tenantId);
      span.raw.setAttribute("app.channel", msg.channel);
      span.raw.setAttribute("app.viewer.id", msg.viewer.id);
      span.raw.setAttribute("app.viewer.role", msg.viewer.role);

      span.setInput(msg.text);
      const response = await runAgent(msg);
      span.setOutput(response);
      return response;
    },
  );
}
The four app.* attributes are outside the OpenInference vocabulary. They go on the raw OTel span and become filter facets in the dashboard. Use the same naming convention (a stable prefix for your app, dot-separated keys) so you can find them easily under inf trace list --metadata "app.channel=slack".

Step 3 — Author Tool Spans Around Each Tool Call

When the LLM emits a tool_use block, your code runs the actual tool function. Wrap that execution in a TOOL span so the trace tree shows what the tool received, what it returned, and how long it took.
TypeScript
// tools.ts
import { manualSpan, SpanKindValues } from "@inference/tracing";
import { initTracing } from "./tracing.ts";

export type ToolName = "lookup_order" | "issue_refund" | "send_email";
export type ToolArgs = Record<string, unknown>;
export type ToolResult = Record<string, unknown>;

const TOOL_IMPLS: Record<ToolName, (args: ToolArgs) => Promise<ToolResult>> = {
  lookup_order: async ({ orderId }) => ({ orderId, status: "shipped" }),
  issue_refund: async ({ orderId, amount }) => ({
    ok: true,
    orderId,
    amount,
    refundId: "RFD-" + Math.floor(Math.random() * 9999),
  }),
  send_email: async ({ to, subject }) => ({ ok: true, to, subject }),
};

export async function executeTool(
  name: ToolName,
  args: ToolArgs,
  toolCallId: string,
): Promise<ToolResult> {
  const tracing = await initTracing();

  return await manualSpan(
    tracing.tracer,
    {
      spanName: `${name}.tool`,
      spanKind: SpanKindValues.TOOL,
      toolName: name,
      toolCallId,
      input: args,
    },
    async (span) => {
      const result = await TOOL_IMPLS[name](args);
      span.setOutput(result);
      return result;
    },
  );
}
manualSpan writes openinference.span.kind=TOOL, tool.name, tool_call.id, input.value, and input.mime_type from the options. The callback only needs to set the output. Span end, status, and exception recording are all handled — if the tool throws, the exception is recorded on the span, the span ends with ERROR, and the original exception re-throws so the agent loop can see it. Because executeTool runs inside the active context established by agentSpan upstream, the TOOL span automatically parents under the agent span. No span IDs need to be threaded through.
If your tool needs behavior manualSpan does not provide — for instance, recording a span event mid-callback while keeping the span alive past the callback return — drop down to tracing.tracer.startActiveSpan and manage status / span.end() yourself. See Manual spans → Alternative TypeScript patterns.

Step 4 — Wire The Agent Loop

The agent loop alternates between calling the LLM and executing tool calls the LLM requests. Both sides are now instrumented.
TypeScript
// agent.ts
import Anthropic from "@anthropic-ai/sdk";
import { executeTool, type ToolName } from "./tools.ts";
import type { IncomingMessage } from "./handler.ts";

const client = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });

const TOOLS: Anthropic.Tool[] = [
  {
    name: "lookup_order",
    description: "Look up an order by ID.",
    input_schema: {
      type: "object",
      properties: { orderId: { type: "string" } },
      required: ["orderId"],
    },
  },
  {
    name: "issue_refund",
    description: "Issue a refund.",
    input_schema: {
      type: "object",
      properties: {
        orderId: { type: "string" },
        amount: { type: "number" },
      },
      required: ["orderId", "amount"],
    },
  },
];

export async function runAgent(msg: IncomingMessage): Promise<string> {
  const messages: Anthropic.MessageParam[] = [
    { role: "user", content: msg.text },
  ];

  for (let turn = 0; turn < 8; turn++) {
    // The patched Anthropic SDK emits an LLM span automatically, parented
    // under the active agent span via OTel context propagation.
    const response = await client.messages.create({
      model: "claude-haiku-4-5",
      max_tokens: 1024,
      tools: TOOLS,
      messages,
    });

    if (response.stop_reason === "end_turn") {
      return textOf(response.content);
    }

    if (response.stop_reason !== "tool_use") {
      return textOf(response.content);
    }

    messages.push({ role: "assistant", content: response.content });

    const toolResults: Anthropic.ToolResultBlockParam[] = [];
    for (const block of response.content) {
      if (block.type !== "tool_use") continue;
      const result = await executeTool(
        block.name as ToolName,
        block.input as Record<string, unknown>,
        block.id,
      );
      toolResults.push({
        type: "tool_result",
        tool_use_id: block.id,
        content: JSON.stringify(result),
      });
    }

    messages.push({ role: "user", content: toolResults });
  }

  return "Max turns reached.";
}

function textOf(content: Anthropic.ContentBlock[]): string {
  return content
    .filter((b): b is Anthropic.TextBlock => b.type === "text")
    .map((b) => b.text)
    .join("");
}
Three observations:
  1. No tracing imports in the inner loop. The agent code looks the same as it would without tracing. The instrumentation is at the boundaries (setup(), agentSpan(), executeTool()).
  2. The patched Anthropic SDK does the LLM-span work. We pass modules: { anthropic: Anthropic } to setup(), and from then on every client.messages.create() call emits an LLM span with input messages, output content blocks, model, finish reason, and token usage.
  3. Tool spans are caller-side. They wrap the real function execution, not the message round-trip. The model-side view of the tool call is captured on the parent LLM span automatically; the caller-side view is the TOOL span we author.

Step 5 — Verify In The Dashboard And CLI

Send a request through the server, then check the resulting trace:
# Find the most recent trace from this service
inf trace list --service customer-support --limit 1

# Open its span tree
inf trace get <trace-id> --view tree

# Inspect a TOOL span's input and output
inf span list --trace-id <trace-id> --kind TOOL
inf span get <trace-id> <span-id> --view io

# Filter on a domain attribute
inf trace list --metadata "app.channel=slack" --range 1h
The trace tree should match the diagram at the top of this page.

Common Variations

Multi-Tenant Service With Per-Request Identity

If agent.id itself depends on the request (for example, a multi-tenant service that runs different agent personas per customer), compute it in the handler:
TypeScript
const agentId = `support-${msg.tenantId}-prod`;

await agentSpan(
  tracing.tracer,
  {
    agentId,
    agentName: `${tenantConfig.displayName} Support`,
    role: "support",
    sessionId: msg.conversationId,
    spanName: "customer-support.run",
  },
  async (span) => { /* ... */ },
);
Stable IDs matter more than human-friendly ones. Prefer support-acme-prod over support-acme-2024-v2 — the Agents dashboard uses the ID to group runs across deploys.

Background Jobs Triggered From The Agent

If your tool launches a background job that itself does LLM work, capture the active identity and pass it into the job so the background span can be filtered together with its originating conversation:
TypeScript
import { getActiveAgentIdentity } from "@inference/tracing";

async function executeTool_enqueueReport(args: ToolArgs): Promise<ToolResult> {
  const identity = getActiveAgentIdentity();
  await jobQueue.enqueue("generate-report", {
    ...args,
    contextAgentId: identity?.id,
    contextSessionId: identity?.id ? identity.id : undefined,
  });
  return { ok: true };
}
The background worker can then set agent.id and session.id on its own agent span so the two pieces of work share dashboard grouping.

Streaming Responses

When the agent streams output back to the user, set the span output once at the end, after the stream completes. The patched SDK already handles streaming LLM calls correctly; the outer agent span just needs the final text:
TypeScript
await agentSpan(tracing.tracer, options, async (span) => {
  span.setInput(msg.text);
  let final = "";
  for await (const chunk of streamAgent(msg)) {
    final += chunk;
    yield chunk; // back to the caller
  }
  span.setOutput(final);
});

Custom Span Events

For mid-callback events that are not span attributes — a rate-limit retry, a fallback to a smaller model, a cache miss — use span.raw.addEvent:
TypeScript
span.raw.addEvent("rate_limit_retry", {
  attempt: 2,
  retry_after_ms: 1500,
});
Events appear under the --view events flag of inf span get and on the span detail page.

What To Test

BehaviorHow to verify
setup() runs before the first SDK callSearch server logs for the Catalyst tracing init message; confirm it precedes any Anthropic request log.
LLM spans parent under the agent spaninf trace get <id> --view tree shows a single AGENT root with LLM and TOOL children.
Tool span has tool.name and tool_call.idinf span get <id> --view attributes
Errors mark the span ERRORForce a tool to throw; confirm the span status is ERROR and the trace status is ERROR.
Spans flush on SIGTERMSend SIGTERM to the server right after a request; the trace should still appear in Catalyst.
Domain attributes are filterableinf trace list --metadata "app.tenant_id=acme" returns the expected traces.

Next Steps

Manual spans

The full surface for AGENT, TOOL, CHAIN, and RETRIEVER spans.

Attributes reference

All Attr.* constants and SpanKindValues with the attributes each kind expects.

Handle API reference

Every method on the span handle and how it coerces values.

Troubleshooting

Debug missing spans, missing attributes, and shutdown behavior.