Skip to content

Adaptive rate control enhancement #1135

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 23 commits into
base: main
Choose a base branch
from

Conversation

seankao-az
Copy link
Collaborator

@seankao-az seankao-az commented Apr 30, 2025

Description

Built upon the existing rate limiter:

Adding the following enhancements:

  • Multi-signal feedback
    • Measure request latency and decrease rate limit if high latency
    • Decrease rate limit to minimum if request timeout
  • Rate limit unit change to byte/s from #doc/s
  • Update to stabilized rate increase (Stabilize adaptive rate limit by considering current rate #1027):
    • RequestRateMeter estimates current rate with a sliding window of size 10s (previously 3s). This is to account for burst permits allowed by guava rate limiter. A slightly larger window will result in less fluctuation in current rate estimation.
    • For example, if 10 permits were acquired in a burst, and response is received after 5 seconds, then:
      • With 3s window, when receiving the response, estimated current rate would be 0, because there wasn't any permits acquired in the past 3 seconds. This will block rate increase.
      • With 10s window, estimated current rate will be non-zero.
  • Stabilized decrease rate: Decrease rate has a cooldown. Usually when a failure or high latency happens, multiple requests will have the same signal and this results in rate limit quickly shrinking multiple times exponentially. The cooldown mechanism is to stabilize the rate limit decreasing. Bad signals are ignored when decrease is still in cooldown.

Some notes on code changes:

  • Move rate limiter related files to its own module
  • Decouple rate setting from the client that uses the rate limit. Now the client only reports feedback, and rate limiter adjusts itself based on the feedback.
  • Use java.time.Clock instead of System.currentTimeMillis() for tracking time for some classes, for stubbing clock in tests.

Infra:

  • Fix Not all tests are run in CI #1097 so that unit tests for this rate limiter change can be run in GitHub CI as well. This is done by changing sbt integtest/integration into sbt test integtest/integration in the workflow. This makes the CI runtime into 41 min (from 38 min)
  • Fix some other minor test failures exposed by above change
    • Fix RetryableHttpAsyncClientSuite
    • Ignore OpenSearchClientUtilsSuite — it passes when it's run by itself but always fail when run in sbt test. Will update in Not all tests are run in CI #1097 to track this

Related Issues

Check List

  • Updated documentation
  • Implemented unit tests
  • New added source code should include a copyright header
  • Commits are signed per the DCO using --signoff
  • Add backport 0.x label if it is a stable change which won't break existing feature

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Sean Kao <seankao@amazon.com>
default config is NOT updated for this change

Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>
- remove testing guava rate limiter behavior
- add test for getting rate limiter instance
- stub clock for testing
- test feedback instead of rate adjustment in bulk wrapper test
- disambiguous request feedbacks

Signed-off-by: Sean Kao <seankao@amazon.com>
fix bug for increase rate with stabilization;
slightly modify estimate rate to allow for setting threshold as 0 to
disable stabilization

Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>

experiment better

Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>
- remove requestSize from RequestFeedback
- decrease cooldown minor change for simpler test case
- fix test cases for 10 seconds sliding window for rate estimation

Signed-off-by: Sean Kao <seankao@amazon.com>
@seankao-az seankao-az changed the title Adaptive rate control enhance Adaptive rate control with multi-signal feedback Apr 30, 2025
@seankao-az seankao-az changed the title Adaptive rate control with multi-signal feedback Adaptive rate control enhancement Apr 30, 2025
Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>
@seankao-az seankao-az added the backport 0.x Backport to 0.x branch (stable branch) label May 7, 2025
@seankao-az seankao-az marked this pull request as ready for review May 7, 2025 04:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 0.x Backport to 0.x branch (stable branch)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Not all tests are run in CI
1 participant