Snippets

These examples are copy-paste ready. Each one shows what gets captured and links to the integration page that covers the surface in depth. For setup and configuration, start with the Traces Quickstart. For an end-to-end view of a real production agent, see the Production Agent Example.

OpenAI Chat Completion

Initialize tracing before constructing the OpenAI client. The SDK patches Chat Completions and emits an LLM span with input messages, output messages, model name, invocation parameters, finish reason, and token counts.

import { setup } from "@inference/tracing";
import OpenAI from "openai";

const tracing = await setup({
  serviceName: "checkout-agent",
  modules: { openai: OpenAI },
});

const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

const response = await client.chat.completions.create({
  model: "gpt-4o-mini",
  messages: [
    { role: "system", content: "You answer in one short sentence." },
    { role: "user", content: "Summarize order ABC-123." },
  ],
  max_tokens: 80,
});

console.log(response.choices[0]?.message.content);
await tracing.shutdown();

import os

from inference_catalyst_tracing import setup
from openai import OpenAI

tracing = setup(service_name="checkout-agent")
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": "You answer in one short sentence."},
        {"role": "user", "content": "Summarize order ABC-123."},
    ],
    max_tokens=80,
)

print(response.choices[0].message.content)
tracing.shutdown()

See OpenAI traces for tool calls, structured outputs, and the Responses API.

OpenAI Tool Round Trip

Tool calls are captured on the model span. The first turn records the assistant tool call and arguments; the second turn records the tool result in the input message list.

import { setup } from "@inference/tracing";
import OpenAI from "openai";

const tracing = await setup({ modules: { openai: OpenAI } });
const client = new OpenAI();

const tools = [
  {
    type: "function" as const,
    function: {
      name: "get_weather",
      description: "Look up the current weather in a city.",
      parameters: {
        type: "object",
        properties: { city: { type: "string" } },
        required: ["city"],
      },
    },
  },
];
const messages: OpenAI.Chat.Completions.ChatCompletionMessageParam[] = [
  { role: "user", content: "What's the weather in San Francisco?" },
];

const first = await client.chat.completions.create({
  model: "gpt-4o-mini",
  messages,
  tools,
});
const toolCalls = first.choices[0]?.message.tool_calls ?? [];
messages.push({ role: "assistant", content: null, tool_calls: toolCalls });

for (const toolCall of toolCalls) {
  const args = JSON.parse(toolCall.function.arguments) as { city: string };
  messages.push({
    role: "tool",
    tool_call_id: toolCall.id,
    content: JSON.stringify({
      city: args.city,
      tempF: 62,
      condition: "sunny",
    }),
  });
}

const final = await client.chat.completions.create({
  model: "gpt-4o-mini",
  messages,
  tools,
});
console.log(final.choices[0]?.message.content);
await tracing.shutdown();

import json

from inference_catalyst_tracing import setup
from openai import OpenAI
from openai.types.chat import ChatCompletionMessageParam

tracing = setup()
client = OpenAI()

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Look up the current weather in a city.",
            "parameters": {
                "type": "object",
                "properties": {"city": {"type": "string"}},
                "required": ["city"],
            },
        },
    },
]
messages: list[ChatCompletionMessageParam] = [
    {"role": "user", "content": "What's the weather in San Francisco?"},
]

first = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=messages,
    tools=tools,
)
tool_calls = first.choices[0].message.tool_calls or []
messages.append(
    {
        "role": "assistant",
        "content": None,
        "tool_calls": [tc.model_dump() for tc in tool_calls],
    },
)

for tool_call in tool_calls:
    args = json.loads(tool_call.function.arguments)
    messages.append(
        {
            "role": "tool",
            "tool_call_id": tool_call.id,
            "content": json.dumps(
                {"city": args["city"], "temp_f": 62, "condition": "sunny"},
            ),
        },
    )

final = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=messages,
    tools=tools,
)
print(final.choices[0].message.content)
tracing.shutdown()

This captures the model-side view of tool calling: what the LLM asked for and the result you passed back. For a caller-side view that wraps the actual function execution in its own TOOL span, see Manual spans.

OpenAI Structured Output

Structured-output requests keep the schema in llm.invocation_parameters and the model response in output.value.

const response = await client.chat.completions.create({
  model: "gpt-4o-mini",
  messages: [
    {
      role: "user",
      content: "Extract the city, temperature, and unit from: 72F in Berlin.",
    },
  ],
  response_format: {
    type: "json_schema",
    json_schema: {
      name: "weather_report",
      strict: true,
      schema: {
        type: "object",
        additionalProperties: false,
        properties: {
          city: { type: "string" },
          temperature: { type: "number" },
          unit: { type: "string", enum: ["F", "C"] },
        },
        required: ["city", "temperature", "unit"],
      },
    },
  },
});

console.log(response.choices[0]?.message.content);

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {
            "role": "user",
            "content": "Extract the city, temperature, and unit from: 72F in Berlin.",
        },
    ],
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "weather_report",
            "strict": True,
            "schema": {
                "type": "object",
                "additionalProperties": False,
                "properties": {
                    "city": {"type": "string"},
                    "temperature": {"type": "number"},
                    "unit": {"type": "string", "enum": ["F", "C"]},
                },
                "required": ["city", "temperature", "unit"],
            },
        },
    },
)

print(response.choices[0].message.content)

OpenAI Responses API

The Responses API is traced separately from Chat Completions. Function-call items are normalized into the same OpenInference tool-call attributes used by Chat Completions, so the dashboard renders them the same way.

const response = await client.responses.create({
  model: "gpt-4o-mini",
  input: "In one sentence, what is OpenTelemetry?",
});

console.log(response.output_text);

response = client.responses.create(
    model="gpt-4o-mini",
    input="In one sentence, what is OpenTelemetry?",
)

print(response.output_text)

Anthropic Messages

Anthropic Messages calls emit LLM spans with user and assistant content blocks, model name, invocation parameters, finish reason, and usage.

import Anthropic from "@anthropic-ai/sdk";
import { setup } from "@inference/tracing";

const tracing = await setup({ modules: { anthropic: Anthropic } });
const client = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });

const message = await client.messages.create({
  model: "claude-haiku-4-5",
  max_tokens: 128,
  messages: [{ role: "user", content: "Respond with just the word hello." }],
});

console.log(message.content);
await tracing.shutdown();

import os

from anthropic import Anthropic
from inference_catalyst_tracing import setup

tracing = setup()
client = Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])

message = client.messages.create(
    model="claude-haiku-4-5",
    max_tokens=128,
    messages=[{"role": "user", "content": "Respond with just the word hello."}],
)

print(message.content)
tracing.shutdown()

Anthropic Prompt Caching

When Anthropic returns prompt-cache usage fields, Catalyst maps them to OpenInference token detail attributes so they show up alongside the regular token counts on the LLM span.

const longSystem =
  "You are a careful, terse assistant. Answer in one sentence.\n\n" +
  "Reference document:\n" +
  "Lorem ipsum dolor sit amet, consectetur adipiscing elit. ".repeat(300);

const params: Anthropic.MessageCreateParamsNonStreaming = {
  model: "claude-haiku-4-5",
  max_tokens: 64,
  system: [
    {
      type: "text",
      text: longSystem,
      cache_control: { type: "ephemeral" },
    },
  ],
  messages: [{ role: "user", content: "Is the document about lorem ipsum?" }],
};

const first = await client.messages.create(params);
const second = await client.messages.create(params);

console.log(first.usage.cache_creation_input_tokens ?? 0);
console.log(second.usage.cache_read_input_tokens ?? 0);

long_system = (
    "You are a careful, terse assistant. Answer in one sentence.\n\n"
    "Reference document:\n"
    + "Lorem ipsum dolor sit amet, consectetur adipiscing elit. " * 300
)

params = dict(
    model="claude-haiku-4-5",
    max_tokens=64,
    system=[
        {
            "type": "text",
            "text": long_system,
            "cache_control": {"type": "ephemeral"},
        },
    ],
    messages=[{"role": "user", "content": "Is the document about lorem ipsum?"}],
)

first = client.messages.create(**params)
second = client.messages.create(**params)

print(first.usage.cache_creation_input_tokens)
print(second.usage.cache_read_input_tokens)

The cache attributes show up on the LLM span as llm.token_count.prompt_details.cache_write and llm.token_count.prompt_details.cache_read. See the Attributes reference for the full set of token-detail keys.

Manual Parent Around Automatic Children

Use an outer agent span around orchestration code when you want nested LLM, tool, or framework spans grouped under one product-level operation. Spans created inside the callback auto-parent under the agent span via OTel context propagation.

import { agentSpan, setup } from "@inference/tracing";
import OpenAI from "openai";

const tracing = await setup({ modules: { openai: OpenAI } });
const client = new OpenAI();

await agentSpan(
  {
    agentId: "refund-review-agent",
    agentName: "Refund Review Agent",
    spanName: "refund-review.run",
    sessionId: "conversation-ticket-123",
    system: "openai",
  },
  async (span) => {
    const ticket = { id: "ticket_123", orderId: "ABC-123" };
    span.setInput(ticket);
    const response = await client.chat.completions.create({
      model: "gpt-4o-mini",
      messages: [
        { role: "user", content: `Review refund for ${ticket.orderId}` },
      ],
    });
    span.setOutput({ decision: response.choices[0]?.message.content });
  },
);

await tracing.shutdown();

from inference_catalyst_tracing import agent_span, setup
from openai import OpenAI

tracing = setup()
client = OpenAI()

with agent_span(
    tracing.tracer,
    agent_id="refund-review-agent",
    agent_name="Refund Review Agent",
    span_name="refund-review.run",
    session_id="conversation-ticket-123",
    system="openai",
) as span:
    ticket = {"id": "ticket_123", "order_id": "ABC-123"}
    span.set_input(ticket)
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "user", "content": f"Review refund for {ticket['order_id']}"},
        ],
    )
    decision = response.choices[0].message.content
    span.set_output({"decision": decision})

tracing.shutdown()

OpenAI Agents With Outer Span

Pair OpenAI Agents with OpenAI instrumentation. Use agentSpan() / agent_span() for an explicit outer span; nested OpenAI calls are captured automatically and parent under it.

import { agentSpan, setup } from "@inference/tracing";
import * as agents from "@openai/agents";
import { Agent, run, tool } from "@openai/agents";
import OpenAI from "openai";
import { z } from "zod";

const tracing = await setup({
  modules: { openai: OpenAI, openaiAgents: agents },
});

const lookupOrder = tool({
  name: "lookup_order",
  description: "Look up an order by ID.",
  parameters: z.object({ orderId: z.string() }),
  execute: async ({ orderId }) =>
    JSON.stringify({ orderId, status: "shipped" }),
});

const supportAgent = new Agent({
  name: "SupportAgent",
  instructions: "Use tools to help customers with orders.",
  tools: [lookupOrder],
  model: "gpt-4o-mini",
});

const userMessage = "Where is order ABC-123?";
await agentSpan(
  {
    agentId: "support-agent-prod",
    agentName: "Support Agent",
    spanName: "support-agent.run",
    sessionId: "conversation-order-abc-123",
    system: "openai",
  },
  async (span) => {
    span.setInput(userMessage);
    const result = await run(supportAgent, userMessage, { maxTurns: 4 });
    span.setOutput(String(result.finalOutput ?? ""));
  },
);

await tracing.shutdown();

import asyncio
import json

from agents import Agent, Runner, function_tool
from inference_catalyst_tracing import agent_span, setup

tracing = setup()

@function_tool
def lookup_order(order_id: str) -> str:
    """Look up an order by ID."""
    return json.dumps({"order_id": order_id, "status": "shipped"})

async def run_support_agent() -> str:
    agent = Agent(
        name="SupportAgent",
        instructions="Use tools to help customers with orders.",
        tools=[lookup_order],
        model="gpt-4o-mini",
    )
    user_message = "Where is order ABC-123?"
    with agent_span(
        tracing.tracer,
        agent_id="support-agent-prod",
        agent_name="Support Agent",
        span_name="support-agent.run",
        session_id="conversation-order-abc-123",
        system="openai",
    ) as span:
        span.set_input(user_message)
        result = await Runner.run(agent, input=user_message, max_turns=4)
        output = str(result.final_output or "")
        span.set_output(output)
        return output

print(asyncio.run(run_support_agent()))
tracing.shutdown()

OpenAI Agents Handoff

Handoffs create a useful trace tree when wrapped in an outer agent span: the triage agent, specialist agent, model calls, and tools are all grouped under one customer request.

TypeScript

import { agentSpan, setup } from "@inference/tracing";
import * as agents from "@openai/agents";
import { Agent, handoff, run, tool } from "@openai/agents";
import OpenAI from "openai";
import { z } from "zod";

const tracing = await setup({
  modules: { openai: OpenAI, openaiAgents: agents },
});

const issueRefund = tool({
  name: "issue_refund",
  description: "Issue a refund for an order.",
  parameters: z.object({ orderId: z.string(), amount: z.number() }),
  execute: async ({ orderId, amount }) =>
    JSON.stringify({ ok: true, orderId, refundId: "RFD-2201", amount }),
});

const refundsAgent = new Agent({
  name: "RefundsAgent",
  instructions: "Handle refund requests and use issue_refund.",
  tools: [issueRefund],
  model: "gpt-4o-mini",
});
const billingAgent = new Agent({
  name: "BillingAgent",
  instructions: "Answer billing questions. Do not issue refunds.",
  model: "gpt-4o-mini",
});
const triageAgent = new Agent({
  name: "TriageAgent",
  instructions: "Route refund requests to RefundsAgent.",
  handoffs: [handoff(refundsAgent), handoff(billingAgent)],
  model: "gpt-4o-mini",
});

await agentSpan(
  {
    agentId: "triage-agent-prod",
    agentName: "Triage Agent",
    spanName: "triage-agent.run",
    sessionId: "conversation-refund-abc-123",
    system: "openai",
  },
  async (span) => {
    const input = "I need a refund for order ABC-123, total $42.50.";
    span.setInput(input);
    const result = await run(triageAgent, input, { maxTurns: 8 });
    span.setOutput(String(result.finalOutput ?? ""));
  },
);

await tracing.shutdown();

LangChain Agent With Tools

LangChain instrumentation hooks the callback manager. You do not need to wrap each tool or model call manually; chain, LLM, and tool spans are emitted from the framework callbacks. If the same agent is already wrapped with LangSmith @traceable, keep that decorator in place and install the langsmith extra. Catalyst uses the LangSmith OTel span as the active parent, so the LangChain and provider spans stay grouped under the decorated run.

import { setup } from "@inference/tracing";
import { ChatAnthropic } from "@langchain/anthropic";
import * as CallbackManagerModule from "@langchain/core/callbacks/manager";
import { createAgent, tool } from "langchain";
import { z } from "zod";

const tracing = await setup({
  modules: { langchainCallbacksManager: CallbackManagerModule },
});

const lookupOrder = tool(
  ({ orderId }) => JSON.stringify({ orderId, status: "shipped", total: 42.5 }),
  {
    name: "lookup_order",
    description: "Look up an order by ID.",
    schema: z.object({ orderId: z.string() }),
  },
);

const cancelOrder = tool(
  ({ orderId, reason }) => JSON.stringify({ ok: true, orderId, reason }),
  {
    name: "cancel_order",
    description: "Cancel a not-yet-delivered order.",
    schema: z.object({ orderId: z.string(), reason: z.string() }),
  },
);

const agent = createAgent({
  model: new ChatAnthropic({ model: "claude-haiku-4-5", maxTokens: 512 }),
  tools: [lookupOrder, cancelOrder],
  systemPrompt: "Use tools to resolve order issues.",
});

const result = await agent.invoke({
  messages: [{ role: "user", content: "Cancel order ABC-123." }],
});
console.log(result.messages.at(-1)?.content);
await tracing.shutdown();

import json

from inference_catalyst_tracing import setup
from langchain.agents import create_agent
from langchain_anthropic import ChatAnthropic
from langchain_core.tools import tool

tracing = setup()

ORDERS = {"ABC-123": {"status": "shipped", "total": 42.5}}

@tool
def lookup_order(order_id: str) -> str:
    """Look up an order by ID."""
    return json.dumps({"order_id": order_id, **ORDERS[order_id]})

@tool
def cancel_order(order_id: str, reason: str) -> str:
    """Cancel a not-yet-delivered order."""
    return json.dumps({"ok": True, "order_id": order_id, "reason": reason})

llm = ChatAnthropic(model_name="claude-haiku-4-5", max_tokens_to_sample=512)
agent = create_agent(
    llm,
    tools=[lookup_order, cancel_order],
    system_prompt="Use tools to resolve order issues.",
)

result = agent.invoke(
    {"messages": [{"role": "user", "content": "Cancel order ABC-123."}]},
)
print(result["messages"][-1].content)
tracing.shutdown()

Pydantic AI Structured Agent (Python)

Pydantic AI ships native OpenTelemetry instrumentation. Catalyst registers its provider and enables Pydantic AI instrumentation during setup().

Python

from inference_catalyst_tracing import setup
from pydantic import BaseModel, Field
from pydantic_ai import Agent, RunContext

class CityWeather(BaseModel):
    city: str
    temp_c: float = Field(description="Temperature in Celsius.")
    condition: str

class WeatherReport(BaseModel):
    cities: list[CityWeather]
    summary: str

tracing = setup()

agent = Agent(
    "openai:gpt-4o-mini",
    output_type=WeatherReport,
    system_prompt="Use get_weather for every requested city.",
)

@agent.tool
def get_weather(_ctx: RunContext[None], city: str) -> str:
    """Look up current weather for a city."""
    return f'{{"city": "{city}", "temp_c": 12, "condition": "overcast"}}'

result = agent.run_sync("What's the weather in Paris and Tokyo?")
print(result.output.summary)
tracing.shutdown()

Claude Agent SDK

Python can patch the SDK during setup() before query is imported. TypeScript uses an explicit wrapper because ESM namespace bindings cannot be safely patched.

import { query } from "@anthropic-ai/claude-agent-sdk";
import { setup, wrapClaudeAgentSdkQuery } from "@inference/tracing";

const tracing = await setup();
const tracedQuery = wrapClaudeAgentSdkQuery(query);

const stream = tracedQuery({
  prompt: "Count files matching *.md under the current directory.",
  options: {
    maxTurns: 4,
    allowedTools: ["Bash"],
    permissionMode: "bypassPermissions",
  },
});

for await (const message of stream) {
  console.log(message);
}

await tracing.shutdown();

from dotenv import load_dotenv
from inference_catalyst_tracing import setup

load_dotenv()
tracing = setup()

from claude_agent_sdk import ClaudeAgentOptions, query  # noqa: E402

options = ClaudeAgentOptions(
    max_turns=4,
    allowed_tools=["Bash"],
    permission_mode="bypassPermissions",
)

async for message in query(
    prompt="Count files matching *.md under the current directory.",
    options=options,
):
    print(message)

tracing.shutdown()

CLI Or Subprocess Work

When a tool has no instrumentable SDK, wrap the subprocess call in an agent span and set the input, output, and token usage when available.

import { agentSpan, setup } from "@inference/tracing";
import { Codex } from "@openai/codex-sdk";

const tracing = await setup();
const codex = new Codex({ apiKey: process.env.OPENAI_API_KEY });

await agentSpan(
  {
    agentId: "codex-prod",
    agentName: "Codex",
    system: "openai",
    spanName: "codex.invocation",
    sessionId: "conversation-cli-hello",
  },
  async (span) => {
    const prompt = "Reply with just the word hello.";
    span.setInput(prompt);
    const thread = codex.startThread({
      skipGitRepoCheck: true,
      sandboxMode: "read-only",
    });
    const turn = await thread.run(prompt);
    span.setOutput(turn.finalResponse ?? "");
    if (turn.usage != null) {
      span.recordTokens({
        prompt: turn.usage.input_tokens ?? 0,
        completion: turn.usage.output_tokens ?? 0,
      });
    }
  },
);

await tracing.shutdown();

import subprocess

from inference_catalyst_tracing import agent_span, setup

tracing = setup()
prompt = "Reply with just the word hello."

with agent_span(
    tracing.tracer,
    agent_id="codex-prod",
    agent_name="Codex",
    system="openai",
    span_name="codex.invocation",
    session_id="conversation-cli-hello",
) as span:
    span.set_input(prompt)
    completed = subprocess.run(
        ["codex", "exec", "--skip-git-repo-check", prompt],
        capture_output=True,
        text=True,
        timeout=120,
        check=True,
    )
    span.set_output(completed.stdout.strip())

tracing.shutdown()

Integrations

Traces

Gateway

OpenAI Chat Completion

OpenAI Tool Round Trip

OpenAI Structured Output

OpenAI Responses API

Anthropic Messages

Anthropic Prompt Caching

Manual Parent Around Automatic Children

OpenAI Agents With Outer Span

OpenAI Agents Handoff

LangChain Agent With Tools

Pydantic AI Structured Agent (Python)

Claude Agent SDK

CLI Or Subprocess Work

Recommended Reading Order

​OpenAI Chat Completion

​OpenAI Tool Round Trip

​OpenAI Structured Output

​OpenAI Responses API

​Anthropic Messages

​Anthropic Prompt Caching

​Manual Parent Around Automatic Children

​OpenAI Agents With Outer Span

​OpenAI Agents Handoff

​LangChain Agent With Tools

​Pydantic AI Structured Agent (Python)

​Claude Agent SDK

​CLI Or Subprocess Work

​Recommended Reading Order

OpenAI Chat Completion

OpenAI Tool Round Trip

OpenAI Structured Output

OpenAI Responses API

Anthropic Messages

Anthropic Prompt Caching

Manual Parent Around Automatic Children

OpenAI Agents With Outer Span

OpenAI Agents Handoff

LangChain Agent With Tools

Pydantic AI Structured Agent (Python)

Claude Agent SDK

CLI Or Subprocess Work

Recommended Reading Order