Skip to main content
This page is the SDK-focused quickstart: install the @inference/tracing (TypeScript) or inference-catalyst-tracing (Python) package, point it at Catalyst, and capture your first span. If you’d rather see the higher-level flow, start with Capture your first trace. The example below uses OpenAI because it’s the smallest end-to-end trace. The same setup pattern applies to Anthropic, LangChain, LangGraph, LangSmith, OpenAI Agents, LiveKit Agents, ElevenLabs Agents, PI AI, Cursor SDK, Claude Agent SDK, Pydantic AI, the Vercel AI SDK, and manual spans.

Choose a setup path

Installing with AI is the quickest. Use the manual flow if you want to wire it up yourself.
Use the Inference CLI to launch a coding agent like Claude Code, OpenCode, or Codex to install the tracing SDK, configure export, and wire up your LLM clients.
1

Install the CLI and authenticate

Install the Inference CLI globally and log in. Your browser will open to authenticate.
npm install -g @inference/cli && inf auth login
2

Run tracing instrumentation in your project

From your project root, run instrumentation in tracing mode.
cd /path/to/your/project && inf instrument --mode tracing
The command guides you through the following workflow:
  • Select a coding agent: Claude Code, OpenCode, or Codex.
  • Scan your codebase for LLM clients and agent frameworks.
  • Install @inference/tracing or inference-catalyst-tracing plus the right per-integration extras.
  • Wire setup() into your app entrypoint so spans start before clients are constructed.
  • Add stable service and agent identity so traces group cleanly in the dashboard.
  • Review the generated changes before applying them.
Pick both instead of tracing to also route requests through the Catalyst Gateway in the same pass.
3

Run your app

Run your application how you normally would. Traces stream to Catalyst as your code executes.
4

View your trace

Open the dashboard and filter by your service name to see the captured trace tree.
Want the full canonical guide for this workflow? See Install with AI.
That’s it. Spans are streaming to Catalyst and your first trace is ready to inspect. What you have so far is one LLM span per call, captured automatically. That’s enough for a one-shot script, but real apps usually run several calls per user request, and you’ll want those grouped under a named agent and session in the dashboard. That’s the next step. If you used Install with AI, the agent likely already wired this up for you; read on to see what it set up and why.

Group calls under an agent

The example above is a one-shot LLM call. Once your app runs multiple LLM calls as part of a logical unit (an agent run, a conversation turn, a workflow), wrap that unit in agentSpan so the LLM spans nest under an AGENT row carrying agent.id, agent.name, and session.id. The Agents dashboard groups on those attributes.
import { agentSpan, setup } from "@inference/tracing";
import OpenAI from "openai";

const tracing = await setup({ modules: { openai: OpenAI } });
const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

await agentSpan(
  {
    agentId: "hello-agent",
    agentName: "Hello Agent",
    sessionId: "session-001",
  },
  async (span) => {
    span.setInput("Reply with just the word hello.");
    const response = await client.chat.completions.create({
      model: "gpt-4o-mini",
      messages: [{ role: "user", content: "Reply with just the word hello." }],
      max_tokens: 16,
    });
    span.setOutput(response.choices[0]?.message.content ?? "");
  },
);

await tracing.shutdown();
The OpenAI LLM span still appears, but now nested under your agentSpan row instead of as an orphan. Real agents run many LLM calls per session; the outer span is what makes them findable as one thing.

Wrap your own code

For non-LLM steps inside an agent loop (a tool call, a retrieval, a custom router, an evaluator, a CLI subprocess), wrap them with manualSpan. Combined with the agentSpan above, you get a full trace tree: an outer AGENT row, inner SDK rows, and inner manual rows, all parented correctly.
import {
  SpanKindValues,
  agentSpan,
  manualSpan,
  setup,
} from "@inference/tracing";

const tracing = await setup();

await agentSpan(
  {
    agentId: "refund-review-agent",
    agentName: "Refund Review Agent",
    spanName: "refund-review.run",
  },
  async (span) => {
    span.setInput("Review refund request #1842");
    const decision = await runRefundReview();
    span.setOutput(decision.summary);
  },
);

// manualSpan authors TOOL / CHAIN / RETRIEVER / EMBEDDING spans.
await manualSpan(
  {
    spanName: "rag.retrieve",
    spanKind: SpanKindValues.RETRIEVER,
    input: { query, k: 8 },
  },
  async (span) => {
    const docs = await retrieve(query);
    span.setOutput(docs);
  },
);

await tracing.shutdown();
For the full manual-span surface (tools, retrievers, embeddings, agent identity), see Manual spans and Agent identity.

Flushing and process lifecycle

Spans are batched and exported in the background, so a process that exits or freezes before the batch flushes drops them. How you flush depends on the process shape:
  • Short-lived script: call await tracing.shutdown() before exit. It force-flushes, then tears the provider down. The examples above do this.
  • Long-lived service (HTTP server, Slack bot, queue worker): call setup() once per process before the first SDK client is constructed, memoize the result so any handler can await it, and call shutdown() only on SIGTERM. Never per request, since that forces a synchronous flush and adds latency.
  • Serverless or edge (Lambda, Cloudflare Workers): memoize setup() the same way, but flush per invocation with tracing.provider.forceFlush() instead of shutdown(), since the provider must survive for the next warm invocation.

Long-lived service

// tracing.ts — memoized setup
import OpenAI from "openai";
import { setup, type CatalystTracing } from "@inference/tracing";

let tracingPromise: Promise<CatalystTracing> | null = null;

export function initTracing(): Promise<CatalystTracing> {
  if (!tracingPromise) {
    tracingPromise = setup({ modules: { openai: OpenAI } });
  }
  return tracingPromise;
}

export async function shutdownTracing(): Promise<void> {
  if (!tracingPromise) return;
  const tracing = await tracingPromise;
  await tracing.shutdown();
}

Serverless and edge runtimes

On Lambda, Cloudflare Workers, or any runtime that freezes the process between invocations, the background batch processor may never run, so spans are dropped. Memoize setup() the same way as a long-lived service (the provider is reused across warm invocations), but flush at the end of each invocation with tracing.provider.forceFlush() rather than calling shutdown(). Reserve shutdown() for real process teardown, since it tears down the provider the next warm invocation needs.
import { initTracing } from "./tracing.ts";

export async function handler(event: { message: string }) {
  const tracing = await initTracing(); // memoized setup(), patches once

  const reply = await answerQuestion(event.message);

  // Flush before the runtime freezes the process. Do not shutdown(): the next
  // warm invocation reuses this provider.
  await tracing.provider.forceFlush();
  return reply;
}

Selective instrumentation

setup() auto-instruments every supported SDK it detects. If you want explicit control — for example, to instrument OpenAI but skip LangChain — set autoInstrument: false and call the targeted helper yourself:
TypeScript
import { setup } from "@inference/tracing";
import { instrumentOpenAI } from "@inference/tracing/openai";
import OpenAI from "openai";

const tracing = await setup({ autoInstrument: false });
instrumentOpenAI(OpenAI, tracing);
The per-integration entry points are listed in the overview’s configuration section. For a full production-shaped server with custom tool spans and domain attributes, see the Production Agent Example.

Verify

Open the Catalyst dashboard and navigate to the Agents or Traces tab. The trace should include an OpenAI LLM span with input messages, output messages, model name, invocation parameters, finish reason, and token counts. Any custom spans you added show up as parent or sibling nodes in the trace tree. If you don’t see anything, see Troubleshooting.

Next Steps

Analyze your traces

Walk trace trees in the dashboard and run Halo to find what to improve.

OpenAI tracing

Add tool calls, structured outputs, and Responses API examples.

Manual spans

Wrap custom agents, CLI calls, and unsupported SDKs.

Agent identity

Add stable agent IDs so the Agents dashboard groups runs correctly.