Skip to main content
Catalyst is a platform for building and deploying task-specific AI models. Instead of relying on large general-purpose models for every task, Catalyst helps you collect production data, evaluate model quality, fine-tune smaller models optimized for your workload, and deploy them on dedicated infrastructure. The platform also provides access to open-source and Inference.net-trained models (like Schematron for structured data extraction) through an OpenAI-compatible API. Not every team goes through every stage. Many start with observability and evals alone. The platform is useful at every step — use only the parts you need, and add more as your requirements grow.

Observe

Record and analyze your production LLM traffic. Catalyst Gateway sits between your app and your LLM provider, capturing every request, response, cost, and latency metric with less than 10ms of overhead. Keep using any provider or model — Gateway is transparent. What you’ll do: Outcome: Full visibility into how your AI features perform in production — broken down by model, task, and provider.

Get started with Observe

Set up Gateway and start capturing LLM traffic.

Datasets

Curate collections of LLM inputs and outputs for evaluation and training. Datasets can come from your live production traffic captured through Observe, or from files you upload directly. What you’ll do: Outcome: Clean, representative datasets scoped to specific tasks — ready to power evals and training.

Get started with Datasets

Build or upload your first dataset.

Eval

Measure model quality with rubrics scored by LLM judges. Define what “good” looks like for your use case, then score model outputs systematically across candidates. Evals tell you which model is better and by how much — so you can make decisions with data instead of intuition. What you’ll do: Outcome: A repeatable, data-driven way to measure model quality before and after every change — and a validated rubric that can guide training.

Get started with Eval

Define quality, measure it, and compare models.

Train

Fine-tune a task-specific model on your production data. The result is a model that’s smaller, faster, and cheaper to run than the general-purpose model it replaces — while being more accurate for your workload. You don’t need to be an ML engineer to use it. What you’ll do: Outcome: A trained, task-specific model that’s been validated against your rubric — ready to deploy.

Get started with Train

Fine-tune a model on your data.

Deploy

Ship your trained model to a dedicated GPU with an OpenAI-compatible API. The API uses the same base URL and API key as the rest of the Inference platform — switching from an off-the-shelf model to your custom model is a one-line code change. What you’ll do: Outcome: A production endpoint serving your custom model — and the beginning of the next improvement loop. Deploy, observe, eval, retrain.

Get started with Deploy

Ship your model to a dedicated GPU.

Pick your starting point

Record your first LLM call

Route traffic through the Catalyst gateway to automatically trace LLM calls and view metrics.

Run your first eval

Define quality, measure it, and compare models side by side.

Train and deploy a model

The full loop: data, training, and a production endpoint.

Use the Inference API

Access open-source and Inference.net models directly.