Skip to main content

Generation Rate Limits

Rate limits for model inference requests are based on your account tier:
TierRequests per minute (RPM)
Free30
Paid250
Custom teams500 - 1,000,000 (per-team overrides)

Deployed Model Rate Limits

Models you deploy on the platform have a rate limit of 300 RPM per instance. If you need higher throughput, scale up the number of instances — total RPM equals the number of instances multiplied by 300.

Batch API Rate Limits

  • Batch file upload: 1 per minute
  • Batch processing rate limits are separate from generation rate limits. See the Batch API docs for details.

General API Rate Limits

The Catalyst gateway enforces a general rate limit of 100 requests per second per API key or IP address. This applies across all endpoints.

Increasing Your Limits

If you need higher rate limits, contact us or use the support chat to request a custom tier.