Skip to content

Conversation

li4wang
Copy link
Contributor

@li4wang li4wang commented Jul 21, 2025

Implemented PrometheusRejectedExecutionHandler and override the rejectedExecution() API to fix the large memory footprint and long GC pause of RejectedExecutionException handler. No more RejectedExecutionException() will be thrown.

This enhancement increased read throughput by 140% according to our perf comparison tests result.

@li4wang li4wang requested review from eolivelli and kezhuw July 22, 2025 17:26
public void rejectedExecution(final Runnable r, final ThreadPoolExecutor e) {
if (!maxQueueSizeExceeded) {
maxQueueSizeExceeded = true;
LOG.warn("Prometheus metrics queue size exceeded the max");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why drop the rate limit logger ?

I would expect rejections are periodically events but not a lifetime event.

Copy link
Contributor Author

@li4wang li4wang Jul 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rate limit logger was dropped based on the following two considerations:

  1. To avoid the unnecessary overhead of logging the same warn msg repeatedly even it's rate limited
  2. It's important to know whether rejection happens, but don't see too many benefits of knowing when the rejection stops and restarts.

I am ok with either way, so let me know if there is a strong preference to have it back. @kezhuw

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's important to know whether rejection happens, but don't see too many benefits of knowing when the rejection stops and restarts.

My concern is that: people could lost the rejection if we log it only once as it could be too far from inspection time. ZooKeeper servers in production are supposed to run for a long time, logs could be truncated or archived periodically. If we log it only once at the first occurence, then it would be hard or even impossible to notice that the metrics backend is overloading now.

To avoid the unnecessary overhead of logging the same warn msg repeatedly even it's rate limited

Is it possible to optimize this as the log string is a const.

I personally think rejection policy deserve rate limit as it could happen shortly and periodically due to system load.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, let me just add the rate limit logger.

Copy link
Contributor Author

@li4wang li4wang Aug 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Enhanced RateLogger reduce string concatenation overhead.
  • Added the rate logger back to visibility when exceeding max happens

@kezhuw
Copy link
Member

kezhuw commented Jul 23, 2025

This enhancement increased read throughput by 140% according to our perf comparison tests result.

Is it possible to add the perf code to tests so others can evaluate it ? I saw three in our codebase.

  1. LOG.info(
    "Deserialized {} nodes in {} ms ({}us/node), depth={} width={} datalen={}",
    count,
    durationms,
    pernodeus,
    depth,
    width,
    len);
  2. @Test
    public void testMultiThreadPerf() throws InterruptedException {
    RequestPathMetricsCollector requestPathMetricsCollector = new RequestPathMetricsCollector();
    Random rand = new Random(System.currentTimeMillis());
    Long startTime = System.currentTimeMillis();
    ThreadPoolExecutor executor = (ThreadPoolExecutor) Executors.newCachedThreadPool();
    //call 100k get Data
    for (int i = 0; i < 100000; i++) {
    executor.submit(
    () -> requestPathMetricsCollector.registerRequest(getData, "/path1/path2/path" + rand.nextInt(10)));
    }
    //5K create
    for (int i = 0; i < 5000; i++) {
    executor.submit(
    () -> requestPathMetricsCollector.registerRequest(create2, "/path1/path2/path" + rand.nextInt(10)));
    }
    //5K delete
    for (int i = 0; i < 5000; i++) {
    executor.submit(
    () -> requestPathMetricsCollector.registerRequest(delete, "/path1/path2/path" + rand.nextInt(10)));
    }
    //40K getChildren
    for (int i = 0; i < 40000; i++) {
    executor.submit(
    () -> requestPathMetricsCollector.registerRequest(getChildren, "/path1/path2/path" + rand.nextInt(10)));
    }
    executor.shutdown();
    //wait for at most 10 mill seconds
    executor.awaitTermination(10, TimeUnit.MILLISECONDS);
    assertTrue(executor.isTerminated());
    Long endTime = System.currentTimeMillis();
    //less than 2 seconds total time
    assertTrue(TimeUnit.MILLISECONDS.toSeconds(endTime - startTime) < 3);
    }
  3. System.out
    .println("Time taken for " + recordCount + " operations are:");
    System.out.println(doOperations.report());

@li4wang
Copy link
Contributor Author

li4wang commented Jul 24, 2025

Is it possible to add the perf code to tests so others can evaluate it
Do you mean to evaluate the performance improvement of this change?

Any tests that perform some write operation can be used if we set the max queue size to a very small number (e.g. 1), however, the challenge is how to measure and compare the perf improvement. This is not something can be simply done by printing out some duration like the sample tests.

If we analyze how AbortPolicy() is implemented, it's quite obvious that this change can reduce GC overhead and memory footprint when queue size exceeds the max. The perf improvement number is provided here for reference purpose. We've done quite comprehensive comparison tests w/o the change.

I would be very happy to provide any clarification on the perf improvement, but personally don't think it's possible to write a meaningful perf test for it in the current code base and don't feel it's really necessary neither.

@kezhuw
Copy link
Member

kezhuw commented Jul 27, 2025

I would be very happy to provide any clarification on the perf improvement, but personally don't think it's possible to write a meaningful perf test for it in the current code base and don't feel it's really necessary neither.

I am ok for this, please go ahead!

@li4wang li4wang force-pushed the ZOOKEEPER-4952 branch 3 times, most recently from 1d59100 to f5d0ea5 Compare August 25, 2025 22:25
…handling

Author: Li Wang <liwang@apple.com>
@li4wang
Copy link
Contributor Author

li4wang commented Aug 25, 2025

Hi @kezhuw The rate logger has been added back with optimized RateLogger for reducing string concatenations. Would you mind taking a look at it? Thanks.

Copy link
Member

@kezhuw kezhuw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@li4wang li4wang removed the request for review from eolivelli August 27, 2025 21:40
@li4wang li4wang merged commit 9e8296f into apache:master Aug 27, 2025
16 checks passed
@li4wang
Copy link
Contributor Author

li4wang commented Aug 27, 2025

Thanks @kezhuw I will go ahead to merge it.

@li4wang
Copy link
Contributor Author

li4wang commented Aug 27, 2025

@kezhuw Do you know if we need to open a separate JIRA and PR if we want to include this in the upcoming 3.10.0 release?

@kezhuw
Copy link
Member

kezhuw commented Aug 28, 2025

Hi @li4wang, master will be 3.10.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants