Support sharding in RedisRateLimiter #3780

Fangzhou1217 · 2025-04-27T07:28:53Z

Problem Statement:
The current implementation of RedisRateLimiter uses a single Redis key for rate limiting counters. This leads to a hot key problem in Redis clusters, where massive requests concentrate on a single Redis shard. The resulting excessive load on individual shards cannot be alleviated through horizontal cluster scaling, creating a performance bottleneck.

Proposed Solution:
This PR introduces a shards configuration parameter to address the hot key issue. Key enhancements include:

Sharded Key Generation: Splits the rate limiting counter into multiple Redis keys using configurable shards
Load Balancing: Implements round-robin distribution to evenly balance requests across shards
Cluster Scalability: Enables true horizontal scaling of Redis clusters by eliminating single-shard pressure

jizhuozhi · 2025-05-01T14:23:09Z

...erver/src/main/java/org/springframework/cloud/gateway/filter/ratelimit/RedisRateLimiter.java

+	int getShardedReplenishRate(Config config) {
+		int replenishRate = config.getReplenishRate();
+		if (config.getShards() > 0) {
+			replenishRate = replenishRate / config.getShards();


if replenishRate is could not be divisible by shards, the actual rate will less than replenishRate, reminder value should be considered.

jizhuozhi · 2025-05-01T14:23:51Z

Very useful feature.

The purpose of rate limiting is to protect downstream services from overload. However, in such as e-commerce platforms promotion scenarios almost all requests are concentrated on a hot resource. Since the same resources use the same rate limiting key, they will continuously request the same redis shard, causing redis overload. In order not to affect business functions when redis is overloaded, we will let it go of the request, resulting in failure to achieve the expected rate limiting effect. And because the same rate limiting key always requests the same redis shard, we cannot improve system stability through distributed computing under the original implementation.

I have carefully read the implementation of the code. By adding suffixes to the rate limiting key to obtain different hashtags, the requests can be dispersed to different redis shards to avoid single instance overload. In addition to avoiding single machine overload, since Lua scripts can be executed on different redis instances at the same time, this will greatly improve the parallelism of rate limiting.

jizhuozhi · 2025-05-02T05:19:22Z

...erver/src/main/java/org/springframework/cloud/gateway/filter/ratelimit/RedisRateLimiter.java

+	/**
+	 * A Round-Robin like index to select a virtual shard.
+	 */
+	private final AtomicInteger shardIndex = new AtomicInteger(0);


Why using Round-Robin?

The fair distribution logic of Round-Robin ensures that all shards have an equal probability of being utilized.

Maybe we could use a random start position to avoid the herd effect

jizhuozhi · 2025-05-07T10:20:16Z

Hello, @spencergibb @ryanjbaxter PTAL, thanks

ryanjbaxter · 2025-05-07T13:40:53Z

...erver/src/main/java/org/springframework/cloud/gateway/filter/ratelimit/RedisRateLimiter.java

@@ -318,6 +355,9 @@ public static class Config {
 		@Min(1)
 		private int requestedTokens = 1;

+		@Min(0)
+		private int shards = 0;


Is a shard something that is known based on how Redis is configured? (Sorry I am not that familiar with Redis)

This is different from what we usually refer to as sharding in storage systems. In fact, it is the concept of virtual sharding introduced at the application layer. As @jizhuozhi said, in the previous implementation, all hot resources would correspond to the same hot key. Due to the same key will fall to the same hash slot (https://redis.io/docs/latest/operate/oss_and_stack/reference/cluster-spec/#key-distribution-model). Therefore, requests are always made to the same instance. What we need to do is to split the rate-limiting resources at the application layer. We can split them into different keys in advance so that they can be placed on different instances.

ryanjbaxter

This seems like a useful enhancement. It would need documentation as well as testing.

Signed-off-by: liufangzhou.aaa <liufangzhou.aaa@bytedance.com>

ryanjbaxter · 2025-05-07T19:35:26Z

Looks like the tests you added are failing.

Fangzhou1217 · 2025-05-08T15:40:58Z

Looks like the tests you added are failing.

Please accept my apologies for the typo. Due to network policy restrictions in my local environment, I was unable to validate the configuration through Docker testing, which inadvertently led to this oversight.

Signed-off-by: liufangzhou.aaa <liufangzhou.aaa@bytedance.com>

ryanjbaxter · 2025-05-09T14:45:55Z

@spencergibb pointed out to me that you could just use a redis cluster and it will automatically do sharding and send the request to the right node based on the key. That seems like a much better solution than trying to do the same thing inside the Gateway....unless I am missing some other kind of benefit.

Fangzhou1217 · 2025-05-09T15:57:43Z

@spencergibb pointed out to me that you could just use a redis cluster and it will automatically do sharding and send the request to the right node based on the key. That seems like a much better solution than trying to do the same thing inside the Gateway....unless I am missing some other kind of benefit.

Our ultimate core is to avoid the situation where the same key always requests to the same redis shard, resulting in the inability to scale horizontally.

The automatic sharding capability of the redis cluster does not reduce the load of redis. Because the same resources use the same rate-limiting key, even if the redis cluster has the automatic sharding capability, it will still request the same redis shards, resulting in redis overload, as shown in Figure 1.

In this submission, by introducing the concept of virtual sharding at the application layer, the hot resources are divided into different keys in advance, and then different redis shards are requested to improve the parallelism of rate limiting and the stability of the system, as shown in Figure 2.

spencergibb · 2025-05-09T16:04:03Z

Our ultimate core is to avoid the situation where the same key always requests to the same redis shard, resulting in the inability to scale horizontally.

I don't understand how this will work. Every request for a specific key could land on a different shard and therefore could have a different value set for tokens remaining. The same key has to retrieve the data from the same shard every time.

jizhuozhi · 2025-05-09T21:51:36Z

The same key has to retrieve the data from the same shard every time.

Since I once had the same scenario as @Fangzhou1217, I can help him answer this question.

Conclusion: the effect we want to achieve is that the same current limiting key (calculated by keyResolver) can request different redis shards instead of a single redis shard.

I created an experiment to simulate the problem that this PR is going to solve:

A redis cluster with 4 physics shards created from Aliyun (as same as AWS or GCP) to make sure it's production ready
A noop httpbin service as backend
A 4 instances spring cloud gateway cluster with shards=0 (group A, using released jar) and shards=8 (group B, using snapshot jar), keyResolver always returns the same key foobar likes this

    @Bean
    fun keyResolver(): KeyResolver {
        return KeyResolver { exchange -> Mono.just("foobar") }
    }

`shards=0` and `shards=8` means virtual shards configured by spring cloud gateway

A 4 instances apache2 ab cluster with 256 concurrency (a carefully chosen value that just reaches 96% CPU per shard in my redis cluster)

Both group A and B has the almost same throughput and latency

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    1   0.6      0       9
Processing:     1   17   5.3     16     387
Waiting:        1   16   5.3     16     380
Total:          1   17   5.2     16     387
WARNING: The median and mean for the initial connection time are not within a normal deviation
        These results are probably not that reliable.

Percentage of the requests served within a certain time (ms)
  50%     16
  66%     18
  75%     19
  80%     20
  90%     23
  95%     27
  98%     31
  99%     35
 100%    387 (longest request)

However, the redis cluster performance of the two groups of experiments is significantly different:

In experimental group A, the average CPU utilization of the redis cluster is only 24%, but for shard 4, the CPU utilization is already 96%, which means that for the same current limiting key, the redis cluster cannot handle more requests. At this time, the number of evalsha requests processed is 36k.

In experimental group A, the average CPU utilization of the redis cluster is 34.8% (greater than 24%), but for all shards, the CPU utilizations are both 34%, which means that the loads are almost balanced, the redis cluster and none of shards is overload. At this time, the number of evalsha requests processed by single shard is 9k and 36k for cluster.

spencergibb · 2025-05-09T22:01:30Z

I don't see how the key from keyresolver can produce correct results (nothing to do with CPU) when requesting the same key from different shards

jizhuozhi · 2025-05-09T22:07:57Z

I don't see how the key from keyresolver can produce correct results (nothing to do with CPU) when requesting the same key from different shards

It's from this PR (https://github.com/spring-cloud/spring-cloud-gateway/pull/3780/files#diff-dae08064f5cb79da1559dadb17abaab12d27407f498b79b611329c0cfa311c48R156) by adding shardId (taking effect as shuffling suffix). Since the ratelimiter just as counter, we could divide into different bucket to share the pressure.

The key problem in this scenario is that the Redis instance used as a counter may become a bottleneck in the current limiting scenario.

spencergibb · 2025-05-12T14:00:33Z

How can the counter return correct results for a given key across multiple shards? I say it can not.

jizhuozhi · 2025-05-12T16:39:30Z

But in most scenarios, are the developers, who are facing the challenge that the Redis becoming a bottleneck in the current limiting really, caring about the reminding?

So, as an enhancement, how about adding a new response header X-RateLimit-Sharding: 1 and unsetting X-RateLimit-Remaining or changing to -1 when developer set shards>0? cc @Fangzhou1217

spencergibb · 2025-05-12T16:49:15Z

If the calculations are not correct, the feature will never get added.

jizhuozhi · 2025-05-12T16:59:55Z

So how about creating a new class named ShardingRedisRateLimiter as same as RedisRateLimiter but with sharding?Then, as a new feature, it will never return Remaining value, and add new header Sharding : 1.

It is still the same problem mentioned before. If Redis becomes a bottleneck and causes current limiting to be unavailable, then the overloaded request will not be rejected but will be directly passed to the business service. At this time, as a service developer, the primary concern is how the gateway can protect the backend service.

spencergibb · 2025-05-12T17:09:22Z

I'm sure there is a timeout setting on the Redis client. Currently, if Redis fails, it allows the request to go through. Wouldn't that be a better solution than another rate limiter implementation? So far, it's only the two of you acknowledging this problem, and I don't want to introduce incorrect behavior.

jizhuozhi · 2025-05-12T17:32:27Z

Currently, if Redis fails, it allows the request to go through

Yes, it allows the request to go through, but the Redis fails is caused by the very large burst requests far exceeds the capacity of single Redis shard (10x or 100x than backend service capacity). We want to make the rate limiter scalable horizontally, rather than overloading a single shard and sending requests directly to more fragile backend services (read/write database).

So far, it's only the two of you acknowledging this problem

I can't assume whether other people need it or not, yes, this is indeed a scenario we encountered, and this PR can solve the problem we encountered. And as far as I know, a lot of business developers have encountered this problem, but they just use other methods to bypass it (such as turning on random discard)

and I don't want to introduce incorrect behavior.

I'm not sure which incorrect behavior introduced if we provide a new ShardingRedisRateLimiter with different set of headers

spencergibb · 2025-05-12T19:25:17Z

I'm not sure which incorrect behavior introduced if we provide a new ShardingRedisRateLimiter with different set of headers

The incorrect behavior I've mentioned in many of my comments, that the number of tokens for a given key will be different across shards.

spencergibb · 2025-05-12T20:50:09Z

You can certainly make your own ShardingRedisRateLimiter, but I'm not ready to accept this for us to try and maintain something like this so I'm closing this for now.

spring-cloud-issues added the waiting-for-triage label Apr 27, 2025

Fangzhou1217 force-pushed the main branch from 5998140 to b457d20 Compare April 27, 2025 07:53

jizhuozhi reviewed May 1, 2025

View reviewed changes

jizhuozhi reviewed May 2, 2025

View reviewed changes

ryanjbaxter reviewed May 7, 2025

View reviewed changes

ryanjbaxter added enhancement and removed waiting-for-triage labels May 7, 2025

RedisRateLimiter support sharding

ac5e6e9

Signed-off-by: liufangzhou.aaa <liufangzhou.aaa@bytedance.com>

Fangzhou1217 force-pushed the main branch from cdd62c9 to 9134c58 Compare May 7, 2025 15:58

liufangzhou.aaa added 2 commits May 8, 2025 23:45

RedisRateLimiter support sharding

deca953

Signed-off-by: liufangzhou.aaa <liufangzhou.aaa@bytedance.com>

fix ut

05e4e1f

Signed-off-by: liufangzhou.aaa <liufangzhou.aaa@bytedance.com>

Fangzhou1217 force-pushed the main branch from 46772c4 to 05e4e1f Compare May 8, 2025 15:45

fix ut

520a640

Signed-off-by: liufangzhou.aaa <liufangzhou.aaa@bytedance.com>

ryanjbaxter added the waiting for feedback label May 9, 2025

spencergibb closed this May 12, 2025

spencergibb added wontfix and removed enhancement waiting for feedback labels May 12, 2025

Support sharding in RedisRateLimiter #3780

Support sharding in RedisRateLimiter #3780

Uh oh!

Conversation

Fangzhou1217 commented Apr 27, 2025

Uh oh!

jizhuozhi May 1, 2025

Choose a reason for hiding this comment

Uh oh!

jizhuozhi commented May 1, 2025

Uh oh!

jizhuozhi May 2, 2025

Choose a reason for hiding this comment

Uh oh!

Fangzhou1217 May 3, 2025

Choose a reason for hiding this comment

Uh oh!

jizhuozhi May 7, 2025

Choose a reason for hiding this comment

Uh oh!

jizhuozhi commented May 7, 2025

Uh oh!

ryanjbaxter May 7, 2025

Choose a reason for hiding this comment

Uh oh!

Fangzhou1217 May 8, 2025

Choose a reason for hiding this comment

Uh oh!

ryanjbaxter left a comment

Choose a reason for hiding this comment

Uh oh!

ryanjbaxter commented May 7, 2025

Uh oh!

Fangzhou1217 commented May 8, 2025

Uh oh!

ryanjbaxter commented May 9, 2025

Uh oh!

Fangzhou1217 commented May 9, 2025

Uh oh!

spencergibb commented May 9, 2025

Uh oh!

jizhuozhi commented May 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

spencergibb commented May 9, 2025

Uh oh!

jizhuozhi commented May 9, 2025

Uh oh!

spencergibb commented May 12, 2025

Uh oh!

jizhuozhi commented May 12, 2025

Uh oh!

spencergibb commented May 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jizhuozhi commented May 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

spencergibb commented May 12, 2025

Uh oh!

jizhuozhi commented May 12, 2025

Uh oh!

spencergibb commented May 12, 2025

Uh oh!

spencergibb commented May 12, 2025

Uh oh!

Uh oh!

jizhuozhi commented May 9, 2025 •

edited

Loading

spencergibb commented May 12, 2025 •

edited

Loading

jizhuozhi commented May 12, 2025 •

edited

Loading