What costs what
| Activity | How it’s billed |
|---|---|
| Inference calls (API) | Per token, varies by model |
| Eval judge calls | Per token (these are full LLM inferences) |
| Training compute | Per GPU-hour (24/hr, minimum 8 GPUs) |
| Deployment GPU hours | Per hour while the deployment is online |