All glossary terms
Verify

Rate limiting

Rate limiting caps the number of requests a client can make to a service within a defined window — typically expressed as 'N requests per second' or 'N requests per minute per API key'. Excess requests are either delayed (queued), rejected (429 response), or shaped (returned with degraded quality).

Rate limiting serves two purposes: protecting the service from abusive or runaway clients, and ensuring fair access among well-behaved clients. Common algorithms include token bucket (refill rate + burst capacity), leaky bucket (smooths bursts to a constant rate), and sliding window log (precise but memory-heavy). Distributed rate limiting (across multiple service instances) typically uses Redis with atomic counters or a sidecar like Envoy. The hard problem is choosing limits: too tight and legitimate traffic hits 429s; too loose and the rate limit provides no protection. The pragmatic approach is to instrument rejection rate, watch the p99 client's burst pattern, and set limits at the 99th percentile of legitimate usage plus a 2-3x safety margin.

Related terms