> ## Documentation Index
> Fetch the complete documentation index at: https://docs.inference.net/llms.txt
> Use this file to discover all available pages before exploring further.

# Rate Limits

> Rate limits for the Inference.net API

## Generation Rate Limits

Rate limits for model inference requests are based on your account tier:

| Tier         | Requests per minute (RPM)            |
| ------------ | ------------------------------------ |
| Free         | 30                                   |
| Paid         | 250                                  |
| Custom teams | 500 - 1,000,000 (per-team overrides) |

## Deployed Model Rate Limits

Models you deploy on the platform have a rate limit of **300 RPM per instance**. If you need higher throughput, scale up the number of instances — total RPM equals the number of instances multiplied by 300.

## Batch API Rate Limits

* **Batch file upload:** 1 per minute
* Batch processing rate limits are separate from generation rate limits. See the [Batch API](/api/async-inference/batch-api) docs for details.

## General API Rate Limits

The Catalyst gateway enforces a general rate limit of **100 requests per second** per API key or IP address. This applies across all endpoints.

## Increasing Your Limits

If you need higher rate limits, [contact us](mailto:support@inference.net) or use the support chat to request a custom tier.
