January 28, 2026·3 min readredisrate-limitingscalability

Building a Distributed Rate Limiter with Redis

Sliding window vs fixed window, why I chose sorted sets, and how the rate limiter at Bolt protected a 10K-RPS platform.

At Bolt we served 200M+ monthly users. The public API was a constant target — well-meaning integrations with buggy retry loops, the occasional scraper, and every so often a genuine attack. A rate limiter isn't a nice-to-have at that scale. It's a survival tool.

Here's the design I landed on after iterating three times.

Fixed window is a trap

Most teams start here because it's simple:

Count requests per user_id in 60-second buckets. If the count exceeds the quota, reject.

It works until it doesn't. The problem is the boundary: a client can fire 2× the quota in the last second of one window and the first second of the next, and both counts pass the check. At 10K RPS that's instant pain for the backend.

Sliding window is what you actually want

Store each request as a timestamped entry. On every request, drop entries older than the window, then count what's left. If it exceeds the quota, reject.

In Redis this maps perfectly to a sorted set:

-- KEYS[1] = rate key, e.g. "rl:user:42:POST:/charges"
-- ARGV[1] = now (ms)
-- ARGV[2] = window (ms)
-- ARGV[3] = limit
-- ARGV[4] = unique request id
 
redis.call("ZREMRANGEBYSCORE", KEYS[1], 0, ARGV[1] - ARGV[2])
local count = redis.call("ZCARD", KEYS[1])
if count >= tonumber(ARGV[3]) then
  return 0
end
redis.call("ZADD", KEYS[1], ARGV[1], ARGV[4])
redis.call("PEXPIRE", KEYS[1], ARGV[2])
return 1

Everything runs atomically inside a Lua script. No race, no WATCH/MULTI gymnastics.

The keys matter

Don't limit on a single dimension. I use (tenant, user, route) as a composite key and maintain per-tenant global quotas separately. This lets you:

Absorb a single misbehaving user without blocking the rest of their tenant.
Enforce per-tenant contracts (free tier vs enterprise).
Protect individual routes that are more expensive (search, pricing).

Cost control

Sorted sets cost more memory than counters. The mitigation:

Aggressive TTLs. Every key expires one window after its last write.
Short window names. rl:t:42:u:99:r:/c beats spelling everything out.
Only store what you'll read back. If you don't need per-request timestamps for analytics, a counter with INCR + EXPIRE at the window boundary is cheaper.

When Redis goes away

Rate limiters become a hard dependency. If Redis is down, you have two options, both wrong:

Fail open — let every request through. The backend melts.
Fail closed — reject every request. You just took the site down.

The answer is fail-soft: each API instance keeps a tiny local counter as a shadow. If Redis is unreachable, fall back to the local view, which is loose but non-zero protection. Ring an alarm loudly.

Telling the client what happened

A good rate limiter is observable from outside. Always return:

HTTP/1.1 429 Too Many Requests
Retry-After: 12
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1706445000

Clients with decent SDKs will back off correctly. Clients without will learn.

What I got wrong the first time

I initially limited only by user_id. Then we hit an abuser using rotating anonymous sessions. Adding IP-level + fingerprint-level limits on top of identity fixed it. Layer your limits.

Results

Prevented several attack patterns from ever reaching business logic.
Zero-tuning enforcement of per-tenant quotas.
Quota changes became a config PR, not a deploy.

Rate limiting is one of those boring systems that earns its keep the first time it saves you. Worth the day to get it right.

Need this level of protection for your API? Let's talk.

Let's build

Have a system that needs to scale — or stop breaking?

I work with a small number of teams each month on architecture reviews, scaling, and hands-on backend engineering. If that sounds like you, let's talk.

Book a call anupkumar@live.in