Home / Use Cases / Rate Limiting

Verdict: Redis or Valkey. INCR plus EXPIRE is the canonical rate-limiter primitive. Memcached can technically do it via CAS, but the pattern is fragile and the ecosystem has standardised on Redis.

Redis vs Memcached for Rate Limiting

Rate limiters are counters with a deadline. The store you pick has to give you atomic increment, atomic expiry, and a network round-trip count low enough to absorb in your request budget. Redis was built for this. Memcached can be coerced into it.

1 RT

Redis INCR + EXPIRE

One round trip via MULTI / pipeline

2 RT

Memcached ADD then INCR

Two-call ceremony, races on cold key

O(log N)

Sorted set insert

Sliding window log via ZADD

Fixed window burst risk

Worst case at boundary, why most prod uses sliding window

Why rate limiters depend on atomicity

A rate limiter answers one question on every request: has this caller exceeded the allowed quota in the current window? The naive implementation reads a counter, checks the threshold, increments, and writes it back. That sequence has a classic read-modify-write race: two requests landing in the same millisecond can both read the same value, both decide the limit is not yet hit, and both increment past it. The user sneaks through at double the rate, or worse, depending on traffic shape.

The fix is an atomic primitive provided by the store: an operation the database guarantees is indivisible. Redis ships INCR key, which increments by one and creates the key set to 1 if it does not exist. That single call cannot interleave with another INCR on the same key. Pair it with EXPIRE key 60 on first hit (or set both in a pipeline) and you have a one-minute counter that resets automatically. The Redis docs publish this as the canonical rate limiter pattern.

Memcached has INCR too, but it has a sharp edge: INCR returns an error on a non-existent key. The first request for a new caller has to ADD the key to zero (or one) and only then INCR. ADD followed by INCR is two trips, which is twice the latency, and worse, the ADD can race with another request that already added 1. You can layer CAS on top to make it safe, but at that point you are reimplementing what Redis ships natively.

Code samples: fixed window rate limit

Redis (Node.js, ioredis)

import Redis from "ioredis";
const r = new Redis();

async function allow(userId, limit = 100, win = 60) {
  const key = `rl:${userId}:${Math.floor(Date.now()/1000/win)}`;
  const [count] = await r
    .multi()
    .incr(key)
    .expire(key, win)
    .exec();
  return count[1] <= limit;
}

// One round trip.
// INCR returns 1 on first request, auto-creates key.
// EXPIRE only matters on first call (no-op after).

Memcached (Node.js, memjs)

import memjs from "memjs";
const m = memjs.Client.create();

async function allow(userId, limit = 100, win = 60) {
  const key = `rl:${userId}:${Math.floor(Date.now()/1000/win)}`;
  // Add returns false if key exists (race-safe seed)
  await m.add(key, "0", { expires: win });
  const { value } = await m.increment(key, 1);
  return value <= limit;
}

// Two round trips minimum.
// ADD races with another worker doing the same ADD.
// Recoverable but error-prone in production.

Sliding window: when fixed window is not good enough

Fixed window rate limiting is simple and almost always wrong at the boundary. Consider a 100-request-per-minute limit. A motivated caller can fire 100 requests at 11:00:59 and another 100 at 11:01:00, ending up with 200 requests in two seconds. For an authenticated API where the limit exists to protect a downstream service, that burst can topple the very thing the limit was protecting. For an unauthenticated public endpoint serving anonymous traffic, the burst is a denial-of-service vector.

The sliding window log algorithm fixes this by storing a timestamp per request in a Redis sorted set keyed by user. On each request you ZADD the current timestamp, ZREMRANGEBYSCORE everything older than one window, and ZCARD to count the survivors. If the count is over the limit, reject. The window slides continuously, so the boundary attack disappears. The cost is memory linear in request count, which matters at scale.

The compromise used by production systems (Cloudflare, Stripe, GitHub) is the sliding window counter: two adjacent fixed-window counters plus a weighted average. You get O(1) memory and the boundary smoothing of a sliding window without storing every request. Cloudflare published the math in their 2017 engineering blog. Redis sorted sets and INCR both support this pattern. Memcached cannot, because it has no sorted set type and the multi-key atomic operations needed for the weighted average are not safe under contention.

Real-world: who runs rate limiters on what

The major API platforms publish their rate-limiting infrastructure. Stripe documents a four-layer scheme on Redis: request limiters, concurrency limiters, fleet-load limiters, and abuse limiters, all backed by a Redis cluster. They picked Redis specifically for the atomic primitives and the sorted-set support that makes their sliding-window logic possible at the throughput Stripe receives.

GitHub's REST API documents 5,000 requests per hour per token, implemented (per their engineering posts) on Redis. The headers X-RateLimit-Limit and X-RateLimit-Remaining come straight from a Redis counter on every call. Discord, Cloudflare's edge platform, and OpenAI's API all use Redis or Redis-compatible stores for the same reasons.

The Memcached camp here is short. The only large-scale Memcached rate-limiter case studies are from the Facebook side of the world, where Memcached is the cache layer but rate limiting (when needed) happens in TAO or in custom in-process logic, not in Memcached itself. If you are building greenfield in 2026, the answer is Redis or Valkey: both speak RESP, both have INCR / EXPIRE / ZADD, and both have client libraries in every language.

Failure modes you need to plan for

The store going down is the obvious one. If Redis is unavailable, do you fail open (let traffic through, accept the abuse risk) or fail closed (block everything, accept the outage)? Most production systems fail open with a short timeout (50-100ms) and aggressive client-side caching of the limit decision, on the logic that a few seconds of unlimited traffic is preferable to a self-induced outage. Stripe's blog says they fail open. Plan for it before the outage, not during.

Clock skew between application servers and the Redis node matters for fixed-window schemes. If your app servers compute the window from Math.floor(Date.now()/1000/60) and two servers disagree by 30 seconds, they will write to different keys and your limit becomes per-app-server. Either compute the bucket on Redis itself with a Lua script reading TIME, or accept the imprecision.

The third failure mode is hot-key contention. A single very popular caller (a single web crawler hitting your API, a single misbehaving SDK) generates all its requests against one key. Redis serves single-key operations on a single thread, so a hot key has a hard throughput ceiling around 100-200K ops/sec on a single shard. The mitigation is to shard the key (suffix with a random 0-15) and sum across shards in the read path. Memcached at this point would be using its multi-threaded edge but you would be paying for it with the broken ADD / INCR race.

FAQ

Can Memcached be used for rate limiting?

Yes, but it is awkward. Memcached supports atomic INCR and CAS, but does not let you increment a key that does not exist (you have to ADD first, then INCR). That two-step pattern races. Redis INCR is atomic and creates the key on first call. For rate limiting, Redis is the cleaner primitive.

What is the sliding window log algorithm?

Sliding window log stores a timestamp per request in a Redis sorted set. ZADD inserts, ZREMRANGEBYSCORE removes timestamps older than the window, ZCARD returns the current count. Precise but uses memory proportional to request count. Sliding window counter (using buckets) is the common compromise.

Should I use a token bucket or fixed window?

Token bucket is smoother and allows bursts. Fixed window is simpler (INCR + EXPIRE) but suffers from the burst-at-boundary problem (a user can fire 2N requests in 1 second around the window boundary). For public APIs use token bucket. For internal services use fixed window.

Can I run a rate limiter in-memory without Redis?

Only for single-instance deployments. The moment you add a second app server, in-memory rate limits become per-server limits, not per-user limits. A user can hit 2x your intended rate by load balancing across two servers. Redis (or any shared store) is required for global rate limits.

Does Cloudflare Workers KV work for rate limiting?

Not well. KV is eventually consistent with up to 60s propagation. For rate limiting, use Durable Objects (single-writer per key, consistent) or Cloudflare's purpose-built Rate Limiting API. Workers KV is a CDN, not a counter.

Related decisions

Distributed locking
Redlock, SETNX, the Aphyr critique Job queue
Lists vs Streams, BullMQ, Sidekiq Session store
Why Memcached for sessions is a footgun Benchmarks
Per-op throughput numbers All use cases
Back to the matrix