Skip to content

Support sharding in RedisRateLimiter #3780

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from

Conversation

Fangzhou1217
Copy link

Problem Statement:​​
The current implementation of RedisRateLimiter uses a single Redis key for rate limiting counters. This leads to a hot key problem in Redis clusters, where massive requests concentrate on a single Redis shard. The resulting excessive load on individual shards cannot be alleviated through horizontal cluster scaling, creating a performance bottleneck.

​​Proposed Solution:​​
This PR introduces a shards configuration parameter to address the hot key issue. Key enhancements include:

  • Sharded Key Generation:​​ Splits the rate limiting counter into multiple Redis keys using configurable shards
  • ​​Load Balancing:​​ Implements round-robin distribution to evenly balance requests across shards
  • ​​Cluster Scalability:​​ Enables true horizontal scaling of Redis clusters by eliminating single-shard pressure

int getShardedReplenishRate(Config config) {
int replenishRate = config.getReplenishRate();
if (config.getShards() > 0) {
replenishRate = replenishRate / config.getShards();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if replenishRate is could not be divisible by shards, the actual rate will less than replenishRate, reminder value should be considered.

@jizhuozhi
Copy link
Contributor

Very useful feature.

The purpose of rate limiting is to protect downstream services from overload. However, in such as e-commerce platforms promotion scenarios almost all requests are concentrated on a hot resource. Since the same resources use the same rate limiting key, they will continuously request the same redis shard, causing redis overload. In order not to affect business functions when redis is overloaded, we will let it go of the request, resulting in failure to achieve the expected rate limiting effect. And because the same rate limiting key always requests the same redis shard, we cannot improve system stability through distributed computing under the original implementation.

I have carefully read the implementation of the code. By adding suffixes to the rate limiting key to obtain different hashtags, the requests can be dispersed to different redis shards to avoid single instance overload. In addition to avoiding single machine overload, since Lua scripts can be executed on different redis instances at the same time, this will greatly improve the parallelism of rate limiting.

/**
* A Round-Robin like index to select a virtual shard.
*/
private final AtomicInteger shardIndex = new AtomicInteger(0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why using Round-Robin?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fair distribution logic of Round-Robin ensures that all shards have an equal probability of being utilized.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could use a random start position to avoid the herd effect

@jizhuozhi
Copy link
Contributor

Hello, @spencergibb @ryanjbaxter PTAL, thanks

@@ -318,6 +355,9 @@ public static class Config {
@Min(1)
private int requestedTokens = 1;

@Min(0)
private int shards = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is a shard something that is known based on how Redis is configured? (Sorry I am not that familiar with Redis)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is different from what we usually refer to as sharding in storage systems. In fact, it is the concept of virtual sharding introduced at the application layer. As @jizhuozhi said, in the previous implementation, all hot resources would correspond to the same hot key. Due to the same key will fall to the same hash slot (https://redis.io/docs/latest/operate/oss_and_stack/reference/cluster-spec/#key-distribution-model). Therefore, requests are always made to the same instance. What we need to do is to split the rate-limiting resources at the application layer. We can split them into different keys in advance so that they can be placed on different instances.

Copy link
Contributor

@ryanjbaxter ryanjbaxter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a useful enhancement. It would need documentation as well as testing.

Signed-off-by: liufangzhou.aaa <liufangzhou.aaa@bytedance.com>
@ryanjbaxter
Copy link
Contributor

Looks like the tests you added are failing.

@Fangzhou1217
Copy link
Author

Looks like the tests you added are failing.

Please accept my apologies for the typo. Due to network policy restrictions in my local environment, I was unable to validate the configuration through Docker testing, which inadvertently led to this oversight.

liufangzhou.aaa added 2 commits May 8, 2025 23:45
Signed-off-by: liufangzhou.aaa <liufangzhou.aaa@bytedance.com>
Signed-off-by: liufangzhou.aaa <liufangzhou.aaa@bytedance.com>
Signed-off-by: liufangzhou.aaa <liufangzhou.aaa@bytedance.com>
@ryanjbaxter
Copy link
Contributor

@spencergibb pointed out to me that you could just use a redis cluster and it will automatically do sharding and send the request to the right node based on the key. That seems like a much better solution than trying to do the same thing inside the Gateway....unless I am missing some other kind of benefit.

@Fangzhou1217
Copy link
Author

@spencergibb pointed out to me that you could just use a redis cluster and it will automatically do sharding and send the request to the right node based on the key. That seems like a much better solution than trying to do the same thing inside the Gateway....unless I am missing some other kind of benefit.

Our ultimate core is to avoid the situation where the same key always requests to the same redis shard, resulting in the inability to scale horizontally.

The automatic sharding capability of the redis cluster does not reduce the load of redis. Because the same resources use the same rate-limiting key, even if the redis cluster has the automatic sharding capability, it will still request the same redis shards, resulting in redis overload, as shown in Figure 1.

github1

In this submission, by introducing the concept of virtual sharding at the application layer, the hot resources are divided into different keys in advance, and then different redis shards are requested to improve the parallelism of rate limiting and the stability of the system, as shown in Figure 2.

github2

@spencergibb
Copy link
Member

Our ultimate core is to avoid the situation where the same key always requests to the same redis shard, resulting in the inability to scale horizontally.

I don't understand how this will work. Every request for a specific key could land on a different shard and therefore could have a different value set for tokens remaining. The same key has to retrieve the data from the same shard every time.

@jizhuozhi
Copy link
Contributor

jizhuozhi commented May 9, 2025

The same key has to retrieve the data from the same shard every time.

Since I once had the same scenario as @Fangzhou1217, I can help him answer this question.

Conclusion: the effect we want to achieve is that the same current limiting key (calculated by keyResolver) can request different redis shards instead of a single redis shard.

I created an experiment to simulate the problem that this PR is going to solve:

  • A redis cluster with 4 physics shards created from Aliyun (as same as AWS or GCP) to make sure it's production ready
  • A noop httpbin service as backend
  • A 4 instances spring cloud gateway cluster with shards=0 (group A, using released jar) and shards=8 (group B, using snapshot jar), keyResolver always returns the same key foobar likes this
    @Bean
    fun keyResolver(): KeyResolver {
        return KeyResolver { exchange -> Mono.just("foobar") }
    }
`shards=0` and `shards=8` means virtual shards configured by spring cloud gateway
  • A 4 instances apache2 ab cluster with 256 concurrency (a carefully chosen value that just reaches 96% CPU per shard in my redis cluster)

Both group A and B has the almost same throughput and latency

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    1   0.6      0       9
Processing:     1   17   5.3     16     387
Waiting:        1   16   5.3     16     380
Total:          1   17   5.2     16     387
WARNING: The median and mean for the initial connection time are not within a normal deviation
        These results are probably not that reliable.

Percentage of the requests served within a certain time (ms)
  50%     16
  66%     18
  75%     19
  80%     20
  90%     23
  95%     27
  98%     31
  99%     35
 100%    387 (longest request)

However, the redis cluster performance of the two groups of experiments is significantly different:

  • In experimental group A, the average CPU utilization of the redis cluster is only 24%, but for shard 4, the CPU utilization is already 96%, which means that for the same current limiting key, the redis cluster cannot handle more requests. At this time, the number of evalsha requests processed is 36k.

image
image
image

  • In experimental group A, the average CPU utilization of the redis cluster is 34.8% (greater than 24%), but for all shards, the CPU utilizations are both 34%, which means that the loads are almost balanced, the redis cluster and none of shards is overload. At this time, the number of evalsha requests processed by single shard is 9k and 36k for cluster.
    image
    image
    image

@spencergibb
Copy link
Member

I don't see how the key from keyresolver can produce correct results (nothing to do with CPU) when requesting the same key from different shards

@jizhuozhi
Copy link
Contributor

I don't see how the key from keyresolver can produce correct results (nothing to do with CPU) when requesting the same key from different shards

It's from this PR (https://github.com/spring-cloud/spring-cloud-gateway/pull/3780/files#diff-dae08064f5cb79da1559dadb17abaab12d27407f498b79b611329c0cfa311c48R156) by adding shardId (taking effect as shuffling suffix). Since the ratelimiter just as counter, we could divide into different bucket to share the pressure.

The key problem in this scenario is that the Redis instance used as a counter may become a bottleneck in the current limiting scenario.

@spencergibb
Copy link
Member

How can the counter return correct results for a given key across multiple shards? I say it can not.

@jizhuozhi
Copy link
Contributor

But in most scenarios, are the developers, who are facing the challenge that the Redis becoming a bottleneck in the current limiting really, caring about the reminding?

So, as an enhancement, how about adding a new response header X-RateLimit-Sharding: 1 and unsetting X-RateLimit-Remaining or changing to -1 when developer set shards>0? cc @Fangzhou1217

@spencergibb
Copy link
Member

spencergibb commented May 12, 2025

If the calculations are not correct, the feature will never get added.

@jizhuozhi
Copy link
Contributor

jizhuozhi commented May 12, 2025

So how about creating a new class named ShardingRedisRateLimiter as same as RedisRateLimiter but with sharding?Then, as a new feature, it will never return Remaining value, and add new header Sharding : 1.

It is still the same problem mentioned before. If Redis becomes a bottleneck and causes current limiting to be unavailable, then the overloaded request will not be rejected but will be directly passed to the business service. At this time, as a service developer, the primary concern is how the gateway can protect the backend service.

@spencergibb
Copy link
Member

I'm sure there is a timeout setting on the Redis client. Currently, if Redis fails, it allows the request to go through. Wouldn't that be a better solution than another rate limiter implementation? So far, it's only the two of you acknowledging this problem, and I don't want to introduce incorrect behavior.

@jizhuozhi
Copy link
Contributor

Currently, if Redis fails, it allows the request to go through

Yes, it allows the request to go through, but the Redis fails is caused by the very large burst requests far exceeds the capacity of single Redis shard (10x or 100x than backend service capacity). We want to make the rate limiter scalable horizontally, rather than overloading a single shard and sending requests directly to more fragile backend services (read/write database).

So far, it's only the two of you acknowledging this problem

I can't assume whether other people need it or not, yes, this is indeed a scenario we encountered, and this PR can solve the problem we encountered. And as far as I know, a lot of business developers have encountered this problem, but they just use other methods to bypass it (such as turning on random discard)

and I don't want to introduce incorrect behavior.

I'm not sure which incorrect behavior introduced if we provide a new ShardingRedisRateLimiter with different set of headers

@spencergibb
Copy link
Member

I'm not sure which incorrect behavior introduced if we provide a new ShardingRedisRateLimiter with different set of headers

The incorrect behavior I've mentioned in many of my comments, that the number of tokens for a given key will be different across shards.

@spencergibb
Copy link
Member

You can certainly make your own ShardingRedisRateLimiter, but I'm not ready to accept this for us to try and maintain something like this so I'm closing this for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants