Catalyst Gateway captures LLM requests flowing through your products. It stores the raw request, response, and metadata associated with each invocation of an LLM. Recorded data is used to provide in-depth metrics and visibility into your LLM token usage, cost, latency, and error rates, across all your providers in a single, unified view. Additionally, this data is used to power downstream model evaluation and training. Catalyst Gateway supports all major LLM providers and frameworks. For agents, tools, framework runs, and custom orchestration, Catalyst Tracing captures full trace trees and individual spans in addition to gateway inferences. View the integrations guide for in-depth instructions.Documentation Index
Fetch the complete documentation index at: https://docs.inference.net/llms.txt
Use this file to discover all available pages before exploring further.
Key concepts
| Concept | Description |
|---|---|
| Gateway | Edge layer between your app and LLM provider. Records traffic with < 10ms overhead. |
| Inference | A single LLM call stored by Gateway. Includes request, response, cost, latency, & token counts. |
| Trace | A multi-step execution captured through OpenTelemetry. Useful for agents, tools, framework runs, and custom orchestration. |
| Span | One step inside a trace, such as a model call, tool call, retriever, graph node, or custom application operation. |
| Task | A user-defined objective (like “summarize docs” or “classify tickets”) that groups related inferences so you can track each AI feature independently. |
| Metrics | Aggregated cost, latency, error rates, and token usage across your inferences. Filterable by model, task, or provider. |
Next steps
Set up tasks
Group your LLM calls by objective.
Integrate with your LLM provider
Connect your app and start capturing traffic.
Metrics Explorer
See your LLM usage dashboards.
Inference Viewer
Browse individual LLM calls.
Trace CLI
Inspect trace trees, span timelines, facets, and exports from the terminal.