Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.inference.net/llms.txt

Use this file to discover all available pages before exploring further.

Catalyst is built around two products.
  • Tracing is for agent improvement. The tracing SDKs capture the full execution of your agents and AI apps: LLM calls, tool calls, framework steps, and any custom spans you wrap. Halo, our open-source agent-loop optimizer, reads those traces and writes up what to fix in prompts, tools, and the harness itself.
  • Gateway is for inference observability and training task-specific models. Gateway sits between your app and your LLM provider, recording every request. From that recorded traffic, you can build datasets, run evals, fine-tune smaller models, and deploy them on dedicated GPUs.
Tracing and Gateway stand alone. Many teams start with one and add the other later. They also compose: when your agent calls LLMs through Gateway, the trace spans and the gateway records line up against the same requests. The platform also provides access to open-source and Inference.net-trained models (like Schematron for structured data extraction) through an OpenAI-compatible API.

Tracing

Capture full traces of your agents and AI apps. The Catalyst tracing SDKs collect LLM calls, tool calls, framework steps, agent runs, and any custom spans you wrap. Then Halo (our open-source agent-loop optimizer) reads your traces and surfaces concrete things to improve in your prompts, tools, and agent harness. What you’ll do: Outcome: Deep visibility into how your agents behave end to end, plus an automated reviewer that points you to the highest-impact fixes.

Get started with Tracing

Install the SDK and capture your first trace.

Observe

Record and analyze your production LLM traffic. Catalyst Gateway sits between your app and your LLM provider, capturing every request, response, cost, and latency metric with less than 10ms of overhead. Keep using any provider or model — Gateway is transparent. What you’ll do: Outcome: Full visibility into how your AI features perform in production — broken down by model, task, and provider.

Get started with Observe

Set up Gateway and start capturing LLM traffic.

Datasets

Curate collections of LLM inputs and outputs for evaluation and training. Datasets can come from your live production traffic captured through Observe, or from files you upload directly. What you’ll do: Outcome: Clean, representative datasets scoped to specific tasks — ready to power evals and training.

Get started with Datasets

Build or upload your first dataset.

Eval

Measure model quality with rubrics scored by LLM judges. Define what “good” looks like for your use case, then score model outputs systematically across candidates. Evals tell you which model is better and by how much — so you can make decisions with data instead of intuition. What you’ll do: Outcome: A repeatable, data-driven way to measure model quality before and after every change — and a validated rubric that can guide training.

Get started with Eval

Define quality, measure it, and compare models.

Train

Fine-tune a task-specific model on your production data. The result is a model that’s smaller, faster, and cheaper to run than the general-purpose model it replaces — while being more accurate for your workload. You don’t need to be an ML engineer to use it. What you’ll do: Outcome: A trained, task-specific model that’s been validated against your rubric — ready to deploy.

Get started with Train

Fine-tune a model on your data.

Deploy

Ship your trained model to a dedicated GPU with an OpenAI-compatible API. The API uses the same base URL and API key as the rest of the Inference platform — switching from an off-the-shelf model to your custom model is a one-line code change. What you’ll do: Outcome: A production endpoint serving your custom model — and the beginning of the next improvement loop. Deploy, observe, eval, retrain.

Get started with Deploy

Ship your model to a dedicated GPU.

Pick your starting point

Capture your first trace

Install the tracing SDK and capture LLM calls, tool calls, and agent steps.

Analyze your traces

Inspect trace trees and run Halo to find what to improve.

Record your first LLM call

Route traffic through the Catalyst gateway to capture LLM calls and view metrics.

Run your first eval

Define quality, measure it, and compare models side by side.

Train and deploy a model

The full loop: data, training, and a production endpoint.

Use the Inference API

Access open-source and Inference.net models directly.