> ## Documentation Index
> Fetch the complete documentation index at: https://docs.inference.net/llms.txt
> Use this file to discover all available pages before exploring further.

# Traces Quickstart

> Install a Catalyst tracing SDK, configure export, and capture your first trace.

This page is the SDK-focused quickstart: install the [`@inference/tracing`](https://www.npmjs.com/package/@inference/tracing) (TypeScript) or [`inference-catalyst-tracing`](https://pypi.org/project/inference-catalyst-tracing/) (Python) package, point it at Catalyst, and capture your first span. If you'd rather see the higher-level flow, start with [Capture your first trace](/get-started/capture-first-trace).

The example below uses OpenAI because it's the smallest end-to-end trace. The same export configuration applies to Anthropic, LangChain, LangGraph, LangSmith, OpenAI Agents, LiveKit Agents, ElevenLabs Agents, Vercel Eve, PI AI, Cursor SDK, Claude Agent SDK, Pydantic AI, the Vercel AI SDK, and manual spans. Each framework guide shows the exact setup hook for that SDK.

## Choose a setup path

Installing with AI is the quickest. Use the manual flow if you want to wire it up yourself.

<Tabs>
  <Tab title="Install with AI">
    Use the [Inference CLI](/cli/overview) to launch a coding agent like [Claude Code](https://code.claude.com/docs/en/overview), OpenCode, or Codex to install the tracing SDK, configure export, and wire up your LLM clients.

    <Steps>
      <Step title="Install the CLI and authenticate">
        Install the Inference CLI globally and log in. Your browser will open to authenticate.

        <Metadata text="integrations/traces/quickstart-ai-auth" />

        ```bash theme={"system"}
        npm install -g @inference/cli && inf auth login
        ```
      </Step>

      <Step title="Run tracing instrumentation in your project">
        From your project root, run instrumentation in tracing mode.

        <Metadata text="integrations/traces/quickstart-ai-instrument" />

        ```bash theme={"system"}
        cd /path/to/your/project && inf instrument --mode tracing
        ```

        The command guides you through the following workflow:

        * Select a coding agent: Claude Code, OpenCode, or Codex.
        * Scan your codebase for LLM clients and agent frameworks.
        * Install `@inference/tracing` or `inference-catalyst-tracing` plus the right per-integration extras.
        * Wire `setup()` into your app entrypoint so spans start before clients are constructed.
        * Add stable service and agent identity so traces group cleanly in the dashboard.
        * Review the generated changes before applying them.

        <Tip>
          Pick `both` instead of `tracing` to also route requests through the Catalyst Gateway in the same pass.
        </Tip>
      </Step>

      <Step title="Run your app">
        Run your application how you normally would. Traces stream to Catalyst as your code executes.
      </Step>

      <Step title="View your trace">
        Open the [dashboard](https://inference.net/dashboard) and filter by your service name to see the captured trace tree.
      </Step>
    </Steps>

    <Note>
      Want the full canonical guide for this workflow? See [Install with AI](/integrations/install-with-ai).
    </Note>
  </Tab>

  <Tab title="Install manually">
    Use this path if you want to wire it up yourself. The example below uses OpenAI. For other providers and frameworks, see the [per-integration guides](/integrations/traces/overview#supported-trace-integrations).

    <Steps>
      <Step title="Install the SDK">
        Provider and framework SDKs are optional peers. Install the ones you use alongside the tracing package. For Python, add per-integration extras to the install string.

        <CodeGroup>
          <Metadata text="integrations/traces/quickstart-install-typescript" />

          ```bash TypeScript theme={"system"}
          # Pick your package manager
          bun add @inference/tracing openai
          npm install @inference/tracing openai
          pnpm add @inference/tracing openai
          ```

          <Metadata text="integrations/traces/quickstart-install-python" />

          ```bash Python theme={"system"}
          pip install 'inference-catalyst-tracing[openai]'
          # Multiple integrations at once
          pip install 'inference-catalyst-tracing[openai,anthropic,langchain]'
          # Everything
          pip install 'inference-catalyst-tracing[all]'
          ```
        </CodeGroup>

        Available Python extras: `openai`, `anthropic`, `langchain`, `langgraph`, `langsmith`, `openai-agents`, `claude-agent-sdk`, `pydantic-ai`, `elevenlabs`, `livekit-agents`, `all`.
      </Step>

      <Step title="Configure export">
        Set the Catalyst traces endpoint and token before your app starts.

        <Metadata text="integrations/traces/quickstart-env" />

        ```bash theme={"system"}
        export CATALYST_OTLP_ENDPOINT="https://telemetry.inference.net"
        # Get your API key from https://inference.net/dashboard/api-keys/
        export CATALYST_OTLP_TOKEN="<your-token>"
        export CATALYST_SERVICE_NAME="checkout-agent"
        ```

        <Tip>
          Use a stable `CATALYST_SERVICE_NAME` per deployed service. It makes traces easier to filter and compare across environments.
        </Tip>

        You can also pass these as options to `setup()` instead of env vars. See the [configuration reference](/integrations/traces/overview#configuration).
      </Step>

      <Step title="Initialize tracing early">
        Call `setup()` before constructing clients from instrumented SDKs. In TypeScript, pass the SDK modules you want patched. In Python, `setup()` auto-detects installed packages.

        <CodeGroup>
          <Metadata text="integrations/traces/quickstart-openai" />

          ```typescript TypeScript theme={"system"}
          import { setup } from "@inference/tracing";
          import OpenAI from "openai";

          const tracing = await setup({
            modules: { openai: OpenAI },
          });

          const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

          const response = await client.chat.completions.create({
            model: "gpt-4o-mini",
            messages: [{ role: "user", content: "Reply with just the word hello." }],
            max_tokens: 16,
          });

          console.log(response.choices[0]?.message.content);
          await tracing.shutdown();
          ```

          <Metadata text="integrations/traces/quickstart-openai" />

          ```python Python theme={"system"}
          import os

          from inference_catalyst_tracing import setup
          from openai import OpenAI

          tracing = setup()
          client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

          response = client.chat.completions.create(
              model="gpt-4o-mini",
              messages=[{"role": "user", "content": "Reply with just the word hello."}],
              max_tokens=16,
          )

          print(response.choices[0].message.content)
          tracing.shutdown()
          ```
        </CodeGroup>

        If the process is short-lived, always call `shutdown()` before exit so batched spans are flushed.
      </Step>

      <Step title="View your trace">
        Open the [dashboard](https://inference.net/dashboard) and navigate to the Agents or Traces tab. You'll see an LLM span with input messages, output messages, model name, invocation parameters, finish reason, and token counts.
      </Step>
    </Steps>

    <Note>
      Need a different provider or framework? See the [supported integrations](/integrations/traces/overview#supported-trace-integrations) list.
    </Note>
  </Tab>
</Tabs>

That's it. Spans are streaming to Catalyst and your first trace is ready to inspect.

What you have so far is one LLM span per call, captured automatically. That's enough for a one-shot script, but real apps usually run several calls per user request, and you'll want those grouped under a named agent and session in the dashboard. That's the next step. If you used Install with AI, the agent likely already wired this up for you; read on to see what it set up and why.

## Group calls under an agent

The example above is a one-shot LLM call. Once your app runs multiple LLM calls as part of a logical unit (an agent run, a conversation turn, a workflow), wrap that unit in `agentSpan` so the LLM spans nest under an `AGENT` row carrying `agent.id`, `agent.name`, and `session.id`. The Agents dashboard groups on those attributes.

<CodeGroup>
  <Metadata text="integrations/traces/quickstart-agent-typescript" />

  ```typescript TypeScript theme={"system"}
  import { agentSpan, setup } from "@inference/tracing";
  import OpenAI from "openai";

  const tracing = await setup({ modules: { openai: OpenAI } });
  const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

  await agentSpan(
    {
      agentId: "hello-agent",
      agentName: "Hello Agent",
      sessionId: "session-001",
    },
    async (span) => {
      span.setInput("Reply with just the word hello.");
      const response = await client.chat.completions.create({
        model: "gpt-4o-mini",
        messages: [{ role: "user", content: "Reply with just the word hello." }],
        max_tokens: 16,
      });
      span.setOutput(response.choices[0]?.message.content ?? "");
    },
  );

  await tracing.shutdown();
  ```

  <Metadata text="integrations/traces/quickstart-agent-python" />

  ```python Python theme={"system"}
  import os

  from inference_catalyst_tracing import agent_span, setup
  from openai import OpenAI

  tracing = setup()
  client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

  with agent_span(
      tracing.tracer,
      agent_id="hello-agent",
      agent_name="Hello Agent",
      session_id="session-001",
  ) as span:
      span.set_input("Reply with just the word hello.")
      response = client.chat.completions.create(
          model="gpt-4o-mini",
          messages=[{"role": "user", "content": "Reply with just the word hello."}],
          max_tokens=16,
      )
      span.set_output(response.choices[0].message.content or "")

  tracing.shutdown()
  ```
</CodeGroup>

The OpenAI LLM span still appears, but now nested under your `agentSpan` row instead of as an orphan. Real agents run many LLM calls per session; the outer span is what makes them findable as one thing.

## Wrap your own code

For non-LLM steps inside an agent loop (a tool call, a retrieval, a custom router, an evaluator, a CLI subprocess), wrap them with `manualSpan`. Combined with the `agentSpan` above, you get a full trace tree: an outer AGENT row, inner SDK rows, and inner manual rows, all parented correctly.

<CodeGroup>
  <Metadata text="integrations/traces/quickstart-manual-typescript" />

  ```typescript TypeScript theme={"system"}
  import {
    SpanKindValues,
    agentSpan,
    manualSpan,
    setup,
  } from "@inference/tracing";

  const tracing = await setup();

  await agentSpan(
    {
      agentId: "refund-review-agent",
      agentName: "Refund Review Agent",
      spanName: "refund-review.run",
    },
    async (span) => {
      span.setInput("Review refund request #1842");
      const decision = await runRefundReview();
      span.setOutput(decision.summary);
    },
  );

  // manualSpan authors TOOL / CHAIN / RETRIEVER / EMBEDDING spans.
  await manualSpan(
    {
      spanName: "rag.retrieve",
      spanKind: SpanKindValues.RETRIEVER,
      input: { query, k: 8 },
    },
    async (span) => {
      const docs = await retrieve(query);
      span.setOutput(docs);
    },
  );

  await tracing.shutdown();
  ```

  <Metadata text="integrations/traces/quickstart-manual-python" />

  ```python Python theme={"system"}
  from inference_catalyst_tracing import (
      SpanKindValues,
      agent_span,
      manual_span,
      setup,
  )

  tracing = setup()

  with agent_span(
      tracing.tracer,
      agent_id="refund-review-agent",
      span_name="refund-review.run",
  ) as span:
      span.set_input("Review refund request #1842")
      decision = run_refund_review()
      span.set_output(decision.summary)

  # manual_span authors TOOL / CHAIN / RETRIEVER / EMBEDDING spans.
  with manual_span(
      tracing.tracer,
      name="rag.retrieve",
      span_kind=SpanKindValues.RETRIEVER,
      input={"query": query, "k": 8},
  ) as span:
      docs = retrieve(query)
      span.set_output(docs)

  tracing.shutdown()
  ```
</CodeGroup>

For the full manual-span surface (tools, retrievers, embeddings, agent identity), see [Manual spans](/integrations/traces/manual-spans) and [Agent identity](/integrations/traces/agent-identity).

## Flushing and process lifecycle

Spans are batched and exported in the background, so a process that exits or
freezes before the batch flushes drops them. How you flush depends on the
process shape:

* **Short-lived script:** call `await tracing.shutdown()` before exit. It
  force-flushes, then tears the provider down. The examples above do this.
* **Long-lived service** (HTTP server, Slack bot, queue worker): call `setup()`
  **once per process** before the first SDK client is constructed, memoize the
  result so any handler can `await` it, and call `shutdown()` only on `SIGTERM`.
  Never per request, since that forces a synchronous flush and adds latency.
* **Serverless or edge** (Lambda, Cloudflare Workers): memoize `setup()` the
  same way, but flush per invocation with `tracing.provider.forceFlush()`
  instead of `shutdown()`, since the provider must survive for the next warm
  invocation.

### Long-lived service

<CodeGroup>
  <Metadata text="integrations/traces/quickstart-server-typescript" />

  ```typescript TypeScript theme={"system"}
  // tracing.ts — memoized setup
  import OpenAI from "openai";
  import { setup, type CatalystTracing } from "@inference/tracing";

  let tracingPromise: Promise<CatalystTracing> | null = null;

  export function initTracing(): Promise<CatalystTracing> {
    if (!tracingPromise) {
      tracingPromise = setup({ modules: { openai: OpenAI } });
    }
    return tracingPromise;
  }

  export async function shutdownTracing(): Promise<void> {
    if (!tracingPromise) return;
    const tracing = await tracingPromise;
    await tracing.shutdown();
  }
  ```

  ```typescript TypeScript (server entrypoint) theme={"system"}
  // server.ts
  import { initTracing, shutdownTracing } from "./tracing.ts";

  await initTracing(); // patches OpenAI before the first client is constructed
  const server = startServer();

  for (const signal of ["SIGTERM", "SIGINT"] as const) {
    process.on(signal, async () => {
      await shutdownTracing();
      server.close(() => process.exit(0));
    });
  }
  ```

  <Metadata text="integrations/traces/quickstart-server-python" />

  ```python Python theme={"system"}
  # tracing.py — memoized setup
  from threading import Lock

  from inference_catalyst_tracing import CatalystTracing, setup

  _tracing: CatalystTracing | None = None
  _lock = Lock()

  def get_tracing() -> CatalystTracing:
      global _tracing
      if _tracing is None:
          with _lock:
              if _tracing is None:
                  _tracing = setup()
      return _tracing

  def shutdown_tracing() -> None:
      if _tracing is not None:
          _tracing.shutdown()
  ```

  ```python Python (server entrypoint) theme={"system"}
  # server.py
  import signal

  from tracing import get_tracing, shutdown_tracing

  get_tracing()  # registers instrumentation before app code runs

  def handle_signal(_signum, _frame):
      shutdown_tracing()
      raise SystemExit(0)

  signal.signal(signal.SIGTERM, handle_signal)
  signal.signal(signal.SIGINT, handle_signal)

  run_server()
  ```
</CodeGroup>

### Serverless and edge runtimes

On Lambda, Cloudflare Workers, or any runtime that freezes the process between
invocations, the background batch processor may never run, so spans are dropped.
Memoize `setup()` the same way as a long-lived service (the provider is reused
across warm invocations), but flush at the end of **each invocation** with
`tracing.provider.forceFlush()` rather than calling `shutdown()`. Reserve
`shutdown()` for real process teardown, since it tears down the provider the
next warm invocation needs.

<CodeGroup>
  <Metadata text="integrations/traces/quickstart-serverless-typescript" />

  ```typescript TypeScript theme={"system"}
  import { initTracing } from "./tracing.ts";

  export async function handler(event: { message: string }) {
    const tracing = await initTracing(); // memoized setup(), patches once

    const reply = await answerQuestion(event.message);

    // Flush before the runtime freezes the process. Do not shutdown(): the next
    // warm invocation reuses this provider.
    await tracing.provider.forceFlush();
    return reply;
  }
  ```

  <Metadata text="integrations/traces/quickstart-serverless-python" />

  ```python Python theme={"system"}
  from tracing import get_tracing

  def handler(event):
      tracing = get_tracing()  # memoized setup(), patches once

      reply = answer_question(event["message"])

      # Flush before the runtime freezes the process. Do not shutdown(): the next
      # warm invocation reuses this provider.
      tracing.provider.force_flush()
      return reply
  ```
</CodeGroup>

### Selective instrumentation

`setup()` auto-instruments every supported SDK it detects. If you want
explicit control — for example, to instrument OpenAI but skip LangChain — set
`autoInstrument: false` and call the targeted helper yourself:

<Metadata text="integrations/traces/quickstart-selective-typescript" />

```typescript TypeScript theme={"system"}
import { setup } from "@inference/tracing";
import { instrumentOpenAI } from "@inference/tracing/openai";
import OpenAI from "openai";

const tracing = await setup({ autoInstrument: false });
instrumentOpenAI(OpenAI, tracing);
```

The per-integration entry points are listed in the
[overview's configuration section](/integrations/traces/overview#configuration).

For a full production-shaped server with custom tool spans and domain
attributes, see the
[Production Agent Example](/integrations/traces/production-agent-example).

## Verify

Open the Catalyst dashboard and navigate to the Agents or Traces tab. The trace should include an OpenAI LLM span with input messages, output messages, model name, invocation parameters, finish reason, and token counts. Any custom spans you added show up as parent or sibling nodes in the trace tree.

If you don't see anything, see [Troubleshooting](/integrations/traces/troubleshooting).

## Next Steps

<CardGroup cols={2}>
  <Card title="Analyze your traces" icon="microscope" href="/get-started/analyze-traces">
    Walk trace trees in the dashboard and run Halo to find what to improve.
  </Card>

  <Card title="OpenAI tracing" icon="sparkles" href="/integrations/traces/openai">
    Add tool calls, structured outputs, and Responses API examples.
  </Card>

  <Card title="Manual spans" icon="pen-nib" href="/integrations/traces/manual-spans">
    Wrap custom agents, CLI calls, and unsupported SDKs.
  </Card>

  <Card title="Agent identity" icon="fingerprint" href="/integrations/traces/agent-identity">
    Add stable agent IDs so the Agents dashboard groups runs correctly.
  </Card>
</CardGroup>
