Generation Rate Limits
Rate limits for model inference requests are based on your account tier:| Tier | Requests per minute (RPM) |
|---|---|
| Free | 30 |
| Paid | 250 |
| Custom teams | 500 - 1,000,000 (per-team overrides) |
Deployed Model Rate Limits
Models you deploy on the platform have a rate limit of 300 RPM per instance. If you need higher throughput, scale up the number of instances — total RPM equals the number of instances multiplied by 300.Batch API Rate Limits
- Batch file upload: 1 per minute
- Batch processing rate limits are separate from generation rate limits. See the Batch API docs for details.