Observe
Record and analyze your production LLM traffic. Catalyst Gateway sits between your app and your LLM provider, capturing every request, response, cost, and latency metric with less than 10ms of overhead. Keep using any provider or model — Gateway is transparent. What you’ll do:- Integrate with your LLM provider to start capturing traffic
- Define tasks to group LLM calls by objective (e.g. “summarize docs”, “classify tickets”)
- Explore metrics for cost, latency, errors, and token usage across all your calls
- Browse individual inferences to inspect raw requests and responses
Get started with Observe
Set up Gateway and start capturing LLM traffic.
Datasets
Curate collections of LLM inputs and outputs for evaluation and training. Datasets can come from your live production traffic captured through Observe, or from files you upload directly. What you’ll do:- Build datasets from traffic by filtering captured inferences and saving them
- Upload your own data as JSONL files when you have curated examples or are migrating from another platform
- Understand dataset formats and the schema your data needs to follow
Get started with Datasets
Build or upload your first dataset.
Eval
Measure model quality with rubrics scored by LLM judges. Define what “good” looks like for your use case, then score model outputs systematically across candidates. Evals tell you which model is better and by how much — so you can make decisions with data instead of intuition. What you’ll do:- Write a rubric that describes your quality criteria in plain English — from a template, AI generation, or scratch
- Run a model comparison to score multiple models side by side on your dataset
- Understand how LLM-as-a-judge scoring works under the hood
- Read the results to interpret scores and decide which model wins
Get started with Eval
Define quality, measure it, and compare models.
Train
Fine-tune a task-specific model on your production data. The result is a model that’s smaller, faster, and cheaper to run than the general-purpose model it replaces — while being more accurate for your workload. You don’t need to be an ML engineer to use it. What you’ll do:- Choose a recipe — a pre-configured training setup with a vetted base model and optimized parameters
- Launch a training run with your training dataset, eval dataset, and rubric
- Monitor mid-training evals to track quality scores as the model learns
Get started with Train
Fine-tune a model on your data.
Deploy
Ship your trained model to a dedicated GPU with an OpenAI-compatible API. The API uses the same base URL and API key as the rest of the Inference platform — switching from an off-the-shelf model to your custom model is a one-line code change. What you’ll do:- Deploy a trained model to a dedicated GPU in a few clicks
- Call your deployment using the same OpenAI-compatible SDK you already use
- Manage and monitor your deployment lifecycle, scaling, and performance
Get started with Deploy
Ship your model to a dedicated GPU.
Pick your starting point
Record your first LLM call
Route traffic through the Catalyst gateway to automatically trace LLM calls and view metrics.
Run your first eval
Define quality, measure it, and compare models side by side.
Train and deploy a model
The full loop: data, training, and a production endpoint.
Use the Inference API
Access open-source and Inference.net models directly.