> ## Documentation Index > Fetch the complete documentation index at: https://docs.inference.net/llms.txt > Use this file to discover all available pages before exploring further. # Traces Quickstart > Install a Catalyst tracing SDK, configure export, and capture your first trace. This page is the SDK-focused quickstart: install the [`@inference/tracing`](https://www.npmjs.com/package/@inference/tracing) (TypeScript) or [`inference-catalyst-tracing`](https://pypi.org/project/inference-catalyst-tracing/) (Python) package, point it at Catalyst, and capture your first span. If you'd rather see the higher-level flow, start with [Capture your first trace](/get-started/capture-first-trace). The example below uses OpenAI because it's the smallest end-to-end trace. The same export configuration applies to Anthropic, LangChain, LangGraph, LangSmith, OpenAI Agents, LiveKit Agents, ElevenLabs Agents, Vercel Eve, PI AI, Cursor SDK, Claude Agent SDK, Pydantic AI, the Vercel AI SDK, and manual spans. Each framework guide shows the exact setup hook for that SDK. ## Choose a setup path Installing with AI is the quickest. Use the manual flow if you want to wire it up yourself. Use the [Inference CLI](/cli/overview) to launch a coding agent like [Claude Code](https://code.claude.com/docs/en/overview), OpenCode, or Codex to install the tracing SDK, configure export, and wire up your LLM clients. Install the Inference CLI globally and log in. Your browser will open to authenticate. ```bash theme={"system"} npm install -g @inference/cli && inf auth login ``` From your project root, run instrumentation in tracing mode. ```bash theme={"system"} cd /path/to/your/project && inf instrument --mode tracing ``` The command guides you through the following workflow: * Select a coding agent: Claude Code, OpenCode, or Codex. * Scan your codebase for LLM clients and agent frameworks. * Install `@inference/tracing` or `inference-catalyst-tracing` plus the right per-integration extras. * Wire `setup()` into your app entrypoint so spans start before clients are constructed. * Add stable service and agent identity so traces group cleanly in the dashboard. * Review the generated changes before applying them. Pick `both` instead of `tracing` to also route requests through the Catalyst Gateway in the same pass. Run your application how you normally would. Traces stream to Catalyst as your code executes. Open the [dashboard](https://inference.net/dashboard) and filter by your service name to see the captured trace tree. Want the full canonical guide for this workflow? See [Install with AI](/integrations/install-with-ai). Use this path if you want to wire it up yourself. The example below uses OpenAI. For other providers and frameworks, see the [per-integration guides](/integrations/traces/overview#supported-trace-integrations). Provider and framework SDKs are optional peers. Install the ones you use alongside the tracing package. For Python, add per-integration extras to the install string. ```bash TypeScript theme={"system"} # Pick your package manager bun add @inference/tracing openai npm install @inference/tracing openai pnpm add @inference/tracing openai ``` ```bash Python theme={"system"} pip install 'inference-catalyst-tracing[openai]' # Multiple integrations at once pip install 'inference-catalyst-tracing[openai,anthropic,langchain]' # Everything pip install 'inference-catalyst-tracing[all]' ``` Available Python extras: `openai`, `anthropic`, `langchain`, `langgraph`, `langsmith`, `openai-agents`, `claude-agent-sdk`, `pydantic-ai`, `elevenlabs`, `livekit-agents`, `all`. Set the Catalyst traces endpoint and token before your app starts. ```bash theme={"system"} export CATALYST_OTLP_ENDPOINT="https://telemetry.inference.net" # Get your API key from https://inference.net/dashboard/api-keys/ export CATALYST_OTLP_TOKEN="" export CATALYST_SERVICE_NAME="checkout-agent" ``` Use a stable `CATALYST_SERVICE_NAME` per deployed service. It makes traces easier to filter and compare across environments. You can also pass these as options to `setup()` instead of env vars. See the [configuration reference](/integrations/traces/overview#configuration). Call `setup()` before constructing clients from instrumented SDKs. In TypeScript, pass the SDK modules you want patched. In Python, `setup()` auto-detects installed packages. ```typescript TypeScript theme={"system"} import { setup } from "@inference/tracing"; import OpenAI from "openai"; const tracing = await setup({ modules: { openai: OpenAI }, }); const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY }); const response = await client.chat.completions.create({ model: "gpt-4o-mini", messages: [{ role: "user", content: "Reply with just the word hello." }], max_tokens: 16, }); console.log(response.choices[0]?.message.content); await tracing.shutdown(); ``` ```python Python theme={"system"} import os from inference_catalyst_tracing import setup from openai import OpenAI tracing = setup() client = OpenAI(api_key=os.environ["OPENAI_API_KEY"]) response = client.chat.completions.create( model="gpt-4o-mini", messages=[{"role": "user", "content": "Reply with just the word hello."}], max_tokens=16, ) print(response.choices[0].message.content) tracing.shutdown() ``` If the process is short-lived, always call `shutdown()` before exit so batched spans are flushed. Open the [dashboard](https://inference.net/dashboard) and navigate to the Agents or Traces tab. You'll see an LLM span with input messages, output messages, model name, invocation parameters, finish reason, and token counts. Need a different provider or framework? See the [supported integrations](/integrations/traces/overview#supported-trace-integrations) list. That's it. Spans are streaming to Catalyst and your first trace is ready to inspect. What you have so far is one LLM span per call, captured automatically. That's enough for a one-shot script, but real apps usually run several calls per user request, and you'll want those grouped under a named agent and session in the dashboard. That's the next step. If you used Install with AI, the agent likely already wired this up for you; read on to see what it set up and why. ## Group calls under an agent The example above is a one-shot LLM call. Once your app runs multiple LLM calls as part of a logical unit (an agent run, a conversation turn, a workflow), wrap that unit in `agentSpan` so the LLM spans nest under an `AGENT` row carrying `agent.id`, `agent.name`, and `session.id`. The Agents dashboard groups on those attributes. ```typescript TypeScript theme={"system"} import { agentSpan, setup } from "@inference/tracing"; import OpenAI from "openai"; const tracing = await setup({ modules: { openai: OpenAI } }); const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY }); await agentSpan( { agentId: "hello-agent", agentName: "Hello Agent", sessionId: "session-001", }, async (span) => { span.setInput("Reply with just the word hello."); const response = await client.chat.completions.create({ model: "gpt-4o-mini", messages: [{ role: "user", content: "Reply with just the word hello." }], max_tokens: 16, }); span.setOutput(response.choices[0]?.message.content ?? ""); }, ); await tracing.shutdown(); ``` ```python Python theme={"system"} import os from inference_catalyst_tracing import agent_span, setup from openai import OpenAI tracing = setup() client = OpenAI(api_key=os.environ["OPENAI_API_KEY"]) with agent_span( tracing.tracer, agent_id="hello-agent", agent_name="Hello Agent", session_id="session-001", ) as span: span.set_input("Reply with just the word hello.") response = client.chat.completions.create( model="gpt-4o-mini", messages=[{"role": "user", "content": "Reply with just the word hello."}], max_tokens=16, ) span.set_output(response.choices[0].message.content or "") tracing.shutdown() ``` The OpenAI LLM span still appears, but now nested under your `agentSpan` row instead of as an orphan. Real agents run many LLM calls per session; the outer span is what makes them findable as one thing. ## Wrap your own code For non-LLM steps inside an agent loop (a tool call, a retrieval, a custom router, an evaluator, a CLI subprocess), wrap them with `manualSpan`. Combined with the `agentSpan` above, you get a full trace tree: an outer AGENT row, inner SDK rows, and inner manual rows, all parented correctly. ```typescript TypeScript theme={"system"} import { SpanKindValues, agentSpan, manualSpan, setup, } from "@inference/tracing"; const tracing = await setup(); await agentSpan( { agentId: "refund-review-agent", agentName: "Refund Review Agent", spanName: "refund-review.run", }, async (span) => { span.setInput("Review refund request #1842"); const decision = await runRefundReview(); span.setOutput(decision.summary); }, ); // manualSpan authors TOOL / CHAIN / RETRIEVER / EMBEDDING spans. await manualSpan( { spanName: "rag.retrieve", spanKind: SpanKindValues.RETRIEVER, input: { query, k: 8 }, }, async (span) => { const docs = await retrieve(query); span.setOutput(docs); }, ); await tracing.shutdown(); ``` ```python Python theme={"system"} from inference_catalyst_tracing import ( SpanKindValues, agent_span, manual_span, setup, ) tracing = setup() with agent_span( tracing.tracer, agent_id="refund-review-agent", span_name="refund-review.run", ) as span: span.set_input("Review refund request #1842") decision = run_refund_review() span.set_output(decision.summary) # manual_span authors TOOL / CHAIN / RETRIEVER / EMBEDDING spans. with manual_span( tracing.tracer, name="rag.retrieve", span_kind=SpanKindValues.RETRIEVER, input={"query": query, "k": 8}, ) as span: docs = retrieve(query) span.set_output(docs) tracing.shutdown() ``` For the full manual-span surface (tools, retrievers, embeddings, agent identity), see [Manual spans](/integrations/traces/manual-spans) and [Agent identity](/integrations/traces/agent-identity). ## Flushing and process lifecycle Spans are batched and exported in the background, so a process that exits or freezes before the batch flushes drops them. How you flush depends on the process shape: * **Short-lived script:** call `await tracing.shutdown()` before exit. It force-flushes, then tears the provider down. The examples above do this. * **Long-lived service** (HTTP server, Slack bot, queue worker): call `setup()` **once per process** before the first SDK client is constructed, memoize the result so any handler can `await` it, and call `shutdown()` only on `SIGTERM`. Never per request, since that forces a synchronous flush and adds latency. * **Serverless or edge** (Lambda, Cloudflare Workers): memoize `setup()` the same way, but flush per invocation with `tracing.provider.forceFlush()` instead of `shutdown()`, since the provider must survive for the next warm invocation. ### Long-lived service ```typescript TypeScript theme={"system"} // tracing.ts — memoized setup import OpenAI from "openai"; import { setup, type CatalystTracing } from "@inference/tracing"; let tracingPromise: Promise | null = null; export function initTracing(): Promise { if (!tracingPromise) { tracingPromise = setup({ modules: { openai: OpenAI } }); } return tracingPromise; } export async function shutdownTracing(): Promise { if (!tracingPromise) return; const tracing = await tracingPromise; await tracing.shutdown(); } ``` ```typescript TypeScript (server entrypoint) theme={"system"} // server.ts import { initTracing, shutdownTracing } from "./tracing.ts"; await initTracing(); // patches OpenAI before the first client is constructed const server = startServer(); for (const signal of ["SIGTERM", "SIGINT"] as const) { process.on(signal, async () => { await shutdownTracing(); server.close(() => process.exit(0)); }); } ``` ```python Python theme={"system"} # tracing.py — memoized setup from threading import Lock from inference_catalyst_tracing import CatalystTracing, setup _tracing: CatalystTracing | None = None _lock = Lock() def get_tracing() -> CatalystTracing: global _tracing if _tracing is None: with _lock: if _tracing is None: _tracing = setup() return _tracing def shutdown_tracing() -> None: if _tracing is not None: _tracing.shutdown() ``` ```python Python (server entrypoint) theme={"system"} # server.py import signal from tracing import get_tracing, shutdown_tracing get_tracing() # registers instrumentation before app code runs def handle_signal(_signum, _frame): shutdown_tracing() raise SystemExit(0) signal.signal(signal.SIGTERM, handle_signal) signal.signal(signal.SIGINT, handle_signal) run_server() ``` ### Serverless and edge runtimes On Lambda, Cloudflare Workers, or any runtime that freezes the process between invocations, the background batch processor may never run, so spans are dropped. Memoize `setup()` the same way as a long-lived service (the provider is reused across warm invocations), but flush at the end of **each invocation** with `tracing.provider.forceFlush()` rather than calling `shutdown()`. Reserve `shutdown()` for real process teardown, since it tears down the provider the next warm invocation needs. ```typescript TypeScript theme={"system"} import { initTracing } from "./tracing.ts"; export async function handler(event: { message: string }) { const tracing = await initTracing(); // memoized setup(), patches once const reply = await answerQuestion(event.message); // Flush before the runtime freezes the process. Do not shutdown(): the next // warm invocation reuses this provider. await tracing.provider.forceFlush(); return reply; } ``` ```python Python theme={"system"} from tracing import get_tracing def handler(event): tracing = get_tracing() # memoized setup(), patches once reply = answer_question(event["message"]) # Flush before the runtime freezes the process. Do not shutdown(): the next # warm invocation reuses this provider. tracing.provider.force_flush() return reply ``` ### Selective instrumentation `setup()` auto-instruments every supported SDK it detects. If you want explicit control — for example, to instrument OpenAI but skip LangChain — set `autoInstrument: false` and call the targeted helper yourself: ```typescript TypeScript theme={"system"} import { setup } from "@inference/tracing"; import { instrumentOpenAI } from "@inference/tracing/openai"; import OpenAI from "openai"; const tracing = await setup({ autoInstrument: false }); instrumentOpenAI(OpenAI, tracing); ``` The per-integration entry points are listed in the [overview's configuration section](/integrations/traces/overview#configuration). For a full production-shaped server with custom tool spans and domain attributes, see the [Production Agent Example](/integrations/traces/production-agent-example). ## Verify Open the Catalyst dashboard and navigate to the Agents or Traces tab. The trace should include an OpenAI LLM span with input messages, output messages, model name, invocation parameters, finish reason, and token counts. Any custom spans you added show up as parent or sibling nodes in the trace tree. If you don't see anything, see [Troubleshooting](/integrations/traces/troubleshooting). ## Next Steps Walk trace trees in the dashboard and run Halo to find what to improve. Add tool calls, structured outputs, and Responses API examples. Wrap custom agents, CLI calls, and unsupported SDKs. Add stable agent IDs so the Agents dashboard groups runs correctly.