Rate Limiting

Rate Limiting is an essential component in every system. At its core, Rate Limiting ensures controlled access to a system by limiting the volume of traffic an external entity can send in a specific time period. Its primary purposes include:

Fair usage: Prevents any single entity (user/system) or group from monopolizing resources.
Stable performance: Protects systems from performance degradation caused by traffic spikes, improving user experience.
Attack mitigation: Helps defend against Denial-of-Service attacks (DoS), password brute-forcing, and more.

This article dives into popular strategies employed for implementing rate limiting effectively:

Leaky Bucket

The Leaky Bucket works much like its name suggests: imagine a bucket with a hole at the bottom. Requests flow into the bucket and leak at a constant rate. Regardless of the intensity of incoming traffic, only a fixed volume is processed.

For example, incoming requests are queued, with a limit of 2 requests can exit per second.

This approach reliably maintains a consistent processing rate for traffic. Excess requests are either queued (thereby delayed) or discarded, rendering it an inherently straightforward and cost-effective strategy for implementation.

Its primary function is to smooth out traffic bursts. Consequently, discarding or delaying excessive traffic can significantly degrade the user experience. Therefore, it is not an ideal choice for applications that frequently encounter bursts of traffic, such as online gaming services.

Token Bucket

The Token Bucket is similar but more flexible than the Leaky Bucket.

Each token represents the permission to process a request.
Tokens are added to the bucket at a steady rate.

Imagine we have a bucket of tokens.

When a request arrives, it picks up a token:

If tokens are exhausted, requests are delayed or discarded:

At intervals, the bucket receives a configurable number of tokens. If the bucket reaches its capacity, any excess tokens are discarded. For example, 2 tokens are added every second.

This algorithm regulates the average data transmission rate over time by managing its token filling rate, a mechanism conceptually similar to the Leaky Bucket algorithm.

A key distinction, however, is that this approach permits tokens to accumulate during periods of lower traffic (slack rounds), up to the predetermined capacity of the bucket. This accumulated reserve of tokens then enables the system to effectively manage sudden bursts of traffic by allowing temporary transmission rates higher than the average.

A key challenge with this method is preparing the system to operate effectively during such bursts. Traffic bursts compel system services to consume additional resources, potentially leading to crashes. Furthermore, as bursts in one service can propagate to others, it is vital to ensure that all affected services are intolerant.

Client-side Limiting

While rate limiting mechanisms can be implemented on the client side, such strategies are inherently unreliable and considered unsafe from a security perspective. This is because client-side controls are susceptible to manipulation or complete bypass.

Consequently, client-side rate limiting should only be employed in a supplementary capacity, supporting more robust server-side enforcement.

Exponential Backoff

Exponential Backoff is a strategy that prevents clients from accessing the system too intensely. When a client encounters transient errors or rate-limiting responses from a server, it should pause before retrying. This pause duration is exponentially increased with each subsequent retry, for example: 1s -> 2s -> 4s -> 8s.

Why should the backoff be exponential? When a server returns an error, it often indicates a heavy load or even a crash. After a few retries, exponential backoff introduces significantly longer delays. This gives the server more time to recover from high workloads. Linear backoff, in contrast, might inadvertently continue to contribute to the server’s heavy load.

Circuit Breaker

The Circuit Breaker pattern is particularly designed for handling long-term issues. When it’s determined that requests are likely to fail, the Circuit Breaker aborts them immediately, thereby conserving resources.

This pattern operates as a proxy and manages requests through three distinct states:

Closed: In this state, requests are routed to the target service as usual.
Open: If the number of failures surpasses a predefined threshold, the circuit breaker transitions to the Open state. In this state, all requests are immediately cancelled. This prevents resource wastage on calls that are likely to fail and provides the target service with an opportunity to recover.
Half-Open: After a designated timeout period, the circuit breaker transitions from the Open state to Half-Open. In this state, a limited number of trial requests are allowed to pass through to the target service:
- If any of these trial requests fail, the breaker presumes the underlying fault persists and reverts to the Open state.
- If all trial requests succeed, the circuit breaker transitions back to the Closed state, resuming normal operation.

The Half-Open state permits only a restricted volume of traffic, which helps prevent the target service from being overwhelmed and allows it additional time to recover.

Combining Exponential Backoff and Circuit Breaker strategies is often effective. Retries (with exponential backoff) may continue until the Circuit Breaker’s failure threshold is reached, at which point the breaker activates to throttle requests immediately.

Last updated on June 26, 2025

Identity And Access Management System Deployment