Home / Benchmarks / At 100k QPS
Verdict: Memcached's ~20-25% throughput edge becomes measurable here. Redis closes the gap with pipelining. Engine choice has real cost implications at this scale because instance size scales with throughput.

Redis vs Memcached at 100k QPS

One hundred thousand queries per second is where engine performance starts to translate into infrastructure cost. The Memcached multi-threading advantage becomes measurable, Redis pipelining becomes essential, and instance-sizing decisions affect the monthly bill in meaningful ways.

~1.0 M/s
Memcached single-node ceiling
cache.c6i.2xlarge, DevGenius 2026
~800 k/s
Redis single-node ceiling
Same hardware, batched pipelines
cache.r7g.xl
Comfortable for 100k QPS
4 vCPU, 26GB RAM, ~$200/mo
10x
Pipelining benefit
Redis 10-cmd pipeline vs single calls

Where the Memcached edge becomes visible

The 2026 DevGenius benchmark on AWS cache.c6i.2xlarge (8 vCPU, 16GB RAM) measures roughly 1.0 million GETs per second sustained on Memcached versus 800k GETs per second on Redis 8.0 for simple single-key GET/SET at 256 client connections. At 1k QPS this 25% gap is invisible (both are at 1% utilisation). At 100k QPS the gap matters: Memcached is at ~10% CPU, Redis is at ~12-15% CPU, both serving comfortably. At 500k QPS the gap becomes load-bearing.

The technical reason for the gap traces to threading. Memcached uses one I/O thread per connection by default (or all-cores with -t flag), so the workload distributes across CPU cores naturally. Redis serves commands on a single thread (with I/O parsing in a separate thread pool since Redis 6) which means a single hot key has a hard throughput ceiling at single-thread speed. At 100k QPS spread across many keys, both engines have plenty of headroom; at 500k+ on a single hot key the Redis ceiling appears first.

The translation to cost: at 100k QPS, the instance-size requirement for Redis might be one tier larger than for Memcached on the same workload (cache.r7g.xlarge for Redis versus cache.r7g.large for Memcached, doubling the per-hour cost). For pure-cache workloads this is a meaningful per-month difference. For workloads where Redis features are needed, you pay the throughput penalty in exchange for the features.

Pipelining is the great equaliser for Redis

The 800k-versus-1.0M number above is for single-command-per-round-trip workloads. When Redis is allowed to pipeline (batch multiple commands in one network round trip), the picture changes dramatically. The DevGenius benchmark shows Redis sustaining ~800k ops/sec with 10-command pipelines, while Memcached's single-call throughput tops out around 1M and pipelined Memcached sees less proportional benefit.

The reason is that Redis pipelining is the canonical optimisation pattern and Redis is heavily optimised for it. The client library bundles commands into pipeline batches automatically (BullMQ, Sidekiq, and most queue libraries do this aggressively). Memcached pipelining exists but is less commonly used; most Memcached clients send one command at a time.

In production, the workloads that hit 100k+ QPS almost always have natural batching opportunities: per-request fetching of N cached fragments, periodic batch operations, queue drainage. If your application can express its cache access pattern as batches, Redis pipelining closes the throughput gap with Memcached and often crosses it. Designing applications to pipeline-friendly is a real lever; the engine choice is less load-bearing than the access-pattern choice.

Instance sizing and reservation

For a 100k QPS workload with HA (primary + 1 replica) on AWS ElastiCache, a typical sizing is cache.r7g.xlarge for either engine, giving roughly $200/month per node on-demand or about $120/month with 1-year all-upfront reservations. Total cluster cost: $240-400/month depending on reservation. Equivalent Valkey clusters are roughly 33% cheaper.

At this scale, reservations start to matter financially. A 3-year all-upfront reservation gives roughly 55% discount versus on-demand, taking a $400/month on-demand cluster to about $180/month. For predictable production workloads with a multi-year planning horizon, this is straightforwardly worth doing. For workloads with uncertain growth, the on-demand premium is the cost of optionality.

The data-transfer cost surprises some teams. At 100k QPS with 1KB average values, you are pushing 100MB/s through the cache. Cross-AZ traffic for the replica replication stream alone can be $200-500/month at this rate. Multi-AZ deployments are expensive at scale; single-AZ saves the data-transfer cost but loses the cross-AZ failure domain isolation. The right answer depends on your availability requirements; do not assume Multi-AZ is free.

The hot-key problem

At 100k QPS spread evenly across millions of keys, both engines are comfortable. The problem appears when traffic is skewed: a single very popular key (a viral piece of content, a global rate-limit counter, a leaderboard for one massively popular event) generates 50k+ QPS on its own. Both engines have hard per-key throughput ceilings.

For Memcached, the per-key ceiling is determined by the single thread handling that key's slab class. Roughly 200-300k QPS per key on modern hardware. Beyond that you need to shard the key across multiple Memcached nodes, which means changing application code to use suffixed key variants (key:0, key:1, ..., key:15) and aggregating reads.

For Redis, the per-key ceiling is determined by the single command-execution thread. Roughly 100-200k QPS per key. Same mitigation: shard the key with random suffixes and merge in the read path. Redis Cluster's hash-tag mechanism can be used to ensure related sharded keys land in the same slot for atomic operations across them. Hot-key handling at 100k+ QPS is an application-level architectural concern, not an engine-choice concern.

FAQ

What instance size handles 100k QPS?

Roughly cache.r7g.large to cache.r7g.xlarge (2-4 vCPU, 13-26GB RAM) for either engine. Memcached can sustain 100k QPS on slightly smaller hardware due to multi-threading; Redis benefits from a few more cores for the I/O thread pool. In either case the instance cost is around $100-300/month depending on reserved-instance status.

Does pipelining matter?

Massively, especially for Redis. A single Redis connection can pipeline thousands of commands per second; a workload that issues 10-command pipelines from 100 concurrent clients can sustain over 800k ops/sec on a single primary (DevGenius March 2026). Without pipelining, the same workload tops out around 100-200k. Memcached pipelining is supported but less commonly used because Memcached's per-command latency is already low.

When do I need to shard?

Single-shard Redis or Memcached comfortably handles 100-200k QPS on cache.r7g.xlarge. Beyond that, you either scale up to a bigger instance or shard. Sharding adds complexity (client topology awareness for Redis Cluster, consistent hashing for Memcached) so scale-up is usually preferable until you hit the largest practical instance size.

What about read replicas for read-heavy workloads?

Redis read replicas let you scale reads horizontally without sharding: writes go to primary, reads route to up to 5 replicas. For read-heavy workloads at 100k QPS this can be more cost-effective than a bigger primary. Memcached has no equivalent because it has no replication.

Network bandwidth as the bottleneck?

At 100k QPS with 1KB values, you are pushing 100MB/s of payload, or roughly 1Gbps. Modern instances easily handle this; an older cache.m4 or cache.t2 generation might saturate the NIC before the engine. Check the network spec of the instance type and confirm you have headroom.

Related decisions

At 1k QPS
Where engine choice does not matter
Memory efficiency
Per-item overhead at scale
ElastiCache pricing
Instance size cost ladder
All benchmarks
Hub