> ## Documentation Index
> Fetch the complete documentation index at: https://docs.inference.net/llms.txt
> Use this file to discover all available pages before exploring further.

# Rate Limits

> Request limits, what happens when you hit them, and how to request higher limits.

For detailed rate limit information including current limits by tier, see the [Inference API rate limits](/api/rate-limits) page.

## What happens when you hit a limit

You'll receive a `429 Too Many Requests` response. Back off and retry with exponential backoff.

## Deployment-specific limits

Self-serve deployments on a single GPU have inherent throughput limits. If traffic exceeds capacity, requests slow down and eventually return 429s. For higher throughput, see [Scale to Production](/platform/deploy/scale-to-production).

## Requesting higher limits

If you need higher rate limits, [talk to the team](https://inference.net/meet-with-us/).
