Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.inference.net/llms.txt

Use this file to discover all available pages before exploring further.

Catalyst Gateway captures LLM requests flowing through your products. It stores the raw request, response, and metadata associated with each invocation of an LLM. Recorded data is used to provide in-depth metrics and visibility into your LLM token usage, cost, latency, and error rates, across all your providers in a single, unified view. Additionally, this data is used to power downstream model evaluation and training. Catalyst Gateway supports all major LLM providers and frameworks. For agents, tools, framework runs, and custom orchestration, Catalyst Tracing captures full trace trees and individual spans in addition to gateway inferences. View the integrations guide for in-depth instructions.

Key concepts

ConceptDescription
GatewayEdge layer between your app and LLM provider. Records traffic with < 10ms overhead.
InferenceA single LLM call stored by Gateway. Includes request, response, cost, latency, & token counts.
TraceA multi-step execution captured through OpenTelemetry. Useful for agents, tools, framework runs, and custom orchestration.
SpanOne step inside a trace, such as a model call, tool call, retriever, graph node, or custom application operation.
TaskA user-defined objective (like “summarize docs” or “classify tickets”) that groups related inferences so you can track each AI feature independently.
MetricsAggregated cost, latency, error rates, and token usage across your inferences. Filterable by model, task, or provider.

Next steps

Set up tasks

Group your LLM calls by objective.

Integrate with your LLM provider

Connect your app and start capturing traffic.

Metrics Explorer

See your LLM usage dashboards.

Inference Viewer

Browse individual LLM calls.

Trace CLI

Inspect trace trees, span timelines, facets, and exports from the terminal.