Adaptive rate control enhancement #1135

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

seankao-az wants to merge 23 commits into opensearch-project:main from seankao-az:adaptive-rate-control-enhance

Collaborator

seankao-az commented Apr 30, 2025 •

edited

Loading

Description

Built upon the existing rate limiter:

Adding the following enhancements:

Multi-signal feedback
- Measure request latency and decrease rate limit if high latency
- Decrease rate limit to minimum if request timeout
Rate limit unit change to byte/s from #doc/s
Update to stabilized rate increase (Stabilize adaptive rate limit by considering current rate #1027):
- RequestRateMeter estimates current rate with a sliding window of size 10s (previously 3s). This is to account for burst permits allowed by guava rate limiter. A slightly larger window will result in less fluctuation in current rate estimation.
- For example, if 10 permits were acquired in a burst, and response is received after 5 seconds, then:
  - With 3s window, when receiving the response, estimated current rate would be 0, because there wasn't any permits acquired in the past 3 seconds. This will block rate increase.
  - With 10s window, estimated current rate will be non-zero.
Stabilized decrease rate: Decrease rate has a cooldown. Usually when a failure or high latency happens, multiple requests will have the same signal and this results in rate limit quickly shrinking multiple times exponentially. The cooldown mechanism is to stabilize the rate limit decreasing. Bad signals are ignored when decrease is still in cooldown.

Some notes on code changes:

Move rate limiter related files to its own module
Decouple rate setting from the client that uses the rate limit. Now the client only reports feedback, and rate limiter adjusts itself based on the feedback.
Use java.time.Clock instead of System.currentTimeMillis() for tracking time for some classes, for stubbing clock in tests.

Infra:

Fix Not all tests are run in CI #1097 so that unit tests for this rate limiter change can be run in GitHub CI as well. This is done by changing sbt integtest/integration into sbt test integtest/integration in the workflow. This makes the CI runtime into 41 min (from 38 min)
Fix some other minor test failures exposed by above change
- Fix RetryableHttpAsyncClientSuite
- Ignore OpenSearchClientUtilsSuite — it passes when it's run by itself but always fail when run in sbt test. Will update in Not all tests are run in CI #1097 to track this

Related Issues

Check List

Updated documentation
Implemented unit tests
New added source code should include a copyright header
Commits are signed per the DCO using --signoff
Add backport 0.x label if it is a stable change which won't break existing feature

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

seankao-az added 16 commits

April 22, 2025 12:12


          disable rate stabilizer; fix test with new config

4d9c961

Signed-off-by: Sean Kao <seankao@amazon.com>


          change rate limit unit to bytes from #requests

af0daaa

default config is NOT updated for this change

Signed-off-by: Sean Kao <seankao@amazon.com>


          fix RetryableHttpAsyncClientSuite

c3434c2

Signed-off-by: Sean Kao <seankao@amazon.com>


          create ratelimit package

a872cc2

Signed-off-by: Sean Kao <seankao@amazon.com>


          encapsulate rate limiter to adapt to feedback

681d1cd

Signed-off-by: Sean Kao <seankao@amazon.com>


          edit comment

814ec00

Signed-off-by: Sean Kao <seankao@amazon.com>


          adapt rate with multi signal feedback

8e8e29a

Signed-off-by: Sean Kao <seankao@amazon.com>


          add custom spark config for testing

b11844a

Signed-off-by: Sean Kao <seankao@amazon.com>


          unit test for bulk wrapper reporting feedback

5f68db6

- remove testing guava rate limiter behavior
- add test for getting rate limiter instance
- stub clock for testing
- test feedback instead of rate adjustment in bulk wrapper test
- disambiguous request feedbacks

Signed-off-by: Sean Kao <seankao@amazon.com>


          test for rate limiter adapt to feedback; fix bug

08b4b3f

fix bug for increase rate with stabilization;
slightly modify estimate rate to allow for setting threshold as 0 to
disable stabilization

Signed-off-by: Sean Kao <seankao@amazon.com>


          wip tcp

bac2721

Signed-off-by: Sean Kao <seankao@amazon.com>


          experiments

723498c

Signed-off-by: Sean Kao <seankao@amazon.com>

experiment better

Signed-off-by: Sean Kao <seankao@amazon.com>


          cleanup random changes

1ae7d9f

Signed-off-by: Sean Kao <seankao@amazon.com>


          remove commented slow start code and cleanup TODO

46ee7fd

Signed-off-by: Sean Kao <seankao@amazon.com>


          remove new configs

6abbd95

Signed-off-by: Sean Kao <seankao@amazon.com>


          fix test; check complete log

c1bee0c

- remove requestSize from RequestFeedback
- decrease cooldown minor change for simpler test case
- fix test cases for 10 seconds sliding window for rate estimation

Signed-off-by: Sean Kao <seankao@amazon.com>

seankao-az changed the title ~~Adaptive rate control enhance~~ Adaptive rate control with multi-signal feedback

seankao-az changed the title ~~Adaptive rate control with multi-signal feedback~~ Adaptive rate control enhancement

seankao-az added 7 commits

April 30, 2025 14:35


          scala fmt

29d01bd

Signed-off-by: Sean Kao <seankao@amazon.com>


          fix default rate limit options

62760ca

Signed-off-by: Sean Kao <seankao@amazon.com>


          fix test cases

b409efe

Signed-off-by: Sean Kao <seankao@amazon.com>


          minor syntax / comment update

Signed-off-by: Sean Kao <seankao@amazon.com>


          run all tests in CI

8949dc8

Signed-off-by: Sean Kao <seankao@amazon.com>


          Merge branch 'main' into adaptive-rate-control-enhance

dc3cacd


          ignore buggy IT (it passes when running isolated though)

8f7daa6

Signed-off-by: Sean Kao <seankao@amazon.com>

seankao-az added the backport 0.x label

seankao-az marked this pull request as ready for review

May 7, 2025 04:34

seankao-az requested review from dai-chen, mengweieric and penghuo as code owners

May 7, 2025 04:34

seankao-az requested review from anirudha, kaituo, YANG-DB, noCharger, LantaoJin and ykmr1224 as code owners

May 7, 2025 04:34

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

dai-chen Awaiting requested review from dai-chen dai-chen is a code owner

mengweieric Awaiting requested review from mengweieric mengweieric is a code owner

penghuo Awaiting requested review from penghuo penghuo is a code owner

anirudha Awaiting requested review from anirudha anirudha is a code owner

kaituo Awaiting requested review from kaituo kaituo is a code owner

YANG-DB Awaiting requested review from YANG-DB YANG-DB is a code owner

noCharger Awaiting requested review from noCharger noCharger is a code owner

LantaoJin Awaiting requested review from LantaoJin LantaoJin is a code owner

ykmr1224 Awaiting requested review from ykmr1224 ykmr1224 is a code owner

At least 1 approving review is required to merge this pull request.

Labels