Skip to content

[AUTOCUT] Gradle Check Flaky Test Report for ReactorNetty4StreamingStressIT #15840

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
opensearch-ci-bot opened this issue Sep 6, 2024 · 8 comments · Fixed by #15859, #18008 or #18193
Open
Assignees
Labels
autocut flaky-test Random test failure that succeeds on second run >test-failure Test failure from CI, local build, etc.

Comments

@opensearch-ci-bot
Copy link
Collaborator

opensearch-ci-bot commented Sep 6, 2024

Flaky Test Report for ReactorNetty4StreamingStressIT

Noticed the ReactorNetty4StreamingStressIT has some flaky, failing tests that failed during post-merge actions.

Details

Git Reference Merged Pull Request Build Details Test Name
087e473 17857 55995 org.opensearch.rest.ReactorNetty4StreamingStressIT.classMethod

org.opensearch.rest.ReactorNetty4StreamingStressIT.testCloseClientStreamingRequest
0d1bb9b 18035 56909 org.opensearch.rest.ReactorNetty4StreamingStressIT.classMethod

org.opensearch.rest.ReactorNetty4StreamingStressIT.testCloseClientStreamingRequest
10fb852 17631 55237 org.opensearch.rest.ReactorNetty4StreamingStressIT.classMethod

org.opensearch.rest.ReactorNetty4StreamingStressIT.testCloseClientStreamingRequest
115de22 17753 55759 org.opensearch.rest.ReactorNetty4StreamingStressIT.classMethod

org.opensearch.rest.ReactorNetty4StreamingStressIT.testCloseClientStreamingRequest
18a3b75 17796 56046 org.opensearch.rest.ReactorNetty4StreamingStressIT.classMethod

org.opensearch.rest.ReactorNetty4StreamingStressIT.testCloseClientStreamingRequest
26beb0f 17996 56710 org.opensearch.rest.ReactorNetty4StreamingStressIT.classMethod

org.opensearch.rest.ReactorNetty4StreamingStressIT.testCloseClientStreamingRequest
374ad77 17844 55886 org.opensearch.rest.ReactorNetty4StreamingStressIT.classMethod

org.opensearch.rest.ReactorNetty4StreamingStressIT.testCloseClientStreamingRequest
5fb4e69 17447 56255 org.opensearch.rest.ReactorNetty4StreamingStressIT.classMethod

org.opensearch.rest.ReactorNetty4StreamingStressIT.testCloseClientStreamingRequest
693c788 17921 56337 org.opensearch.rest.ReactorNetty4StreamingStressIT.classMethod

org.opensearch.rest.ReactorNetty4StreamingStressIT.testCloseClientStreamingRequest
6c0a95b 17605 54623 org.opensearch.rest.ReactorNetty4StreamingStressIT.classMethod

org.opensearch.rest.ReactorNetty4StreamingStressIT.testCloseClientStreamingRequest
cd8fa4f 17887 56079 org.opensearch.rest.ReactorNetty4StreamingStressIT.classMethod

org.opensearch.rest.ReactorNetty4StreamingStressIT.testCloseClientStreamingRequest
d29e95c 17882 56063 org.opensearch.rest.ReactorNetty4StreamingStressIT.classMethod

org.opensearch.rest.ReactorNetty4StreamingStressIT.testCloseClientStreamingRequest
e6ffc62 17609 54672 org.opensearch.rest.ReactorNetty4StreamingStressIT.classMethod

org.opensearch.rest.ReactorNetty4StreamingStressIT.testCloseClientStreamingRequest
ebd743a 17642 54804 org.opensearch.rest.ReactorNetty4StreamingStressIT.classMethod

org.opensearch.rest.ReactorNetty4StreamingStressIT.testCloseClientStreamingRequest
d64baa6 15637 47004 org.opensearch.rest.ReactorNetty4StreamingStressIT.testCloseClientStreamingRequest
fae1453 18116 57382 org.opensearch.rest.ReactorNetty4StreamingStressIT.testCloseClientStreamingRequest

The other pull requests, besides those involved in post-merge actions, that contain failing tests with the ReactorNetty4StreamingStressIT class are:

For more details on the failed tests refer to OpenSearch Gradle Check Metrics dashboard.

@opensearch-ci-bot opensearch-ci-bot added >test-failure Test failure from CI, local build, etc. autocut flaky-test Random test failure that succeeds on second run untriaged labels Sep 6, 2024
@reta reta self-assigned this Sep 6, 2024
@reta
Copy link
Contributor

reta commented Mar 18, 2025

Closing, the test suite timeout:

java.lang.Exception: Test abandoned because suite timeout was reached.
	at __randomizedtesting.SeedInfo.seed([49B56BB27F82E2AA]:0)

@andrross
Copy link
Member

@reta Unfortunately this failed again on PR #18060 which did contain the change from #18008:

REPRODUCE WITH: ./gradlew ':plugins:transport-reactor-netty4:javaRestTest' --tests "org.opensearch.rest.ReactorNetty4StreamingStressIT.testCloseClientStreamingRequest" -Dtests.seed=D6B2DB4CE5B44876 -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=en-US -Dtests.timezone=Etc/GMT-7 -Druntime.java=21

ReactorNetty4StreamingStressIT > testCloseClientStreamingRequest FAILED
    java.lang.AssertionError: VerifySubscriber timed out on reactor.core.publisher.FluxMap$MapSubscriber@64f123db
        at __randomizedtesting.SeedInfo.seed([D6B2DB4CE5B44876:4FA6F76359C1B064]:0)
        at reactor.test.MessageFormatter.assertionError(MessageFormatter.java:115)
        at reactor.test.DefaultStepVerifierBuilder$DefaultVerifySubscriber.pollTaskEventOrComplete(DefaultStepVerifierBuilder.java:1728)
        at reactor.test.DefaultStepVerifierBuilder$DefaultVerifySubscriber.verify(DefaultStepVerifierBuilder.java:1298)
        at reactor.test.DefaultStepVerifierBuilder$DefaultStepVerifier.verify(DefaultStepVerifierBuilder.java:832)
        at org.opensearch.rest.ReactorNetty4StreamingStressIT.testCloseClientStreamingRequest(ReactorNetty4StreamingStressIT.java:80)

@reta
Copy link
Contributor

reta commented Apr 24, 2025

@reta Unfortunately this failed again on PR #18060 which did contain the change from #18008:

REPRODUCE WITH: ./gradlew ':plugins:transport-reactor-netty4:javaRestTest' --tests "org.opensearch.rest.ReactorNetty4StreamingStressIT.testCloseClientStreamingRequest" -Dtests.seed=D6B2DB4CE5B44876 -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=en-US -Dtests.timezone=Etc/GMT-7 -Druntime.java=21

ReactorNetty4StreamingStressIT > testCloseClientStreamingRequest FAILED
    java.lang.AssertionError: VerifySubscriber timed out on reactor.core.publisher.FluxMap$MapSubscriber@64f123db
        at __randomizedtesting.SeedInfo.seed([D6B2DB4CE5B44876:4FA6F76359C1B064]:0)
        at reactor.test.MessageFormatter.assertionError(MessageFormatter.java:115)
        at reactor.test.DefaultStepVerifierBuilder$DefaultVerifySubscriber.pollTaskEventOrComplete(DefaultStepVerifierBuilder.java:1728)
        at reactor.test.DefaultStepVerifierBuilder$DefaultVerifySubscriber.verify(DefaultStepVerifierBuilder.java:1298)
        at reactor.test.DefaultStepVerifierBuilder$DefaultStepVerifier.verify(DefaultStepVerifierBuilder.java:832)
        at org.opensearch.rest.ReactorNetty4StreamingStressIT.testCloseClientStreamingRequest(ReactorNetty4StreamingStressIT.java:80)

Got it, thanks @andrross , I will take a look shortly, sorry about that

x-INFiN1TY-x pushed a commit to x-INFiN1TY-x/OpenSearch_Local that referenced this issue Apr 27, 2025
@andrross
Copy link
Member

I spent a few minutes looking into this, but couldn't figure out a fix. This is the relevant code:

StepVerifier.create(Flux.from(streamingResponse.getBody()).map(b -> new String(b.array(), StandardCharsets.UTF_8)))
.expectNextMatches(s -> s.contains("\"result\":\"created\"") && s.contains("\"_id\":\"1\""))
.then(() -> {
try {
client().close();
} catch (final IOException ex) {
throw new UncheckedIOException(ex);
}
})
.then(() -> scheduler.advanceTimeBy(delay))
.expectErrorMatches(t -> t instanceof InterruptedIOException || t instanceof ConnectionClosedException)
.verify(Duration.ofSeconds(10));

When it fails it seems to match the first "onNext" match, then it closes the client (which takes 5 seconds due to the graceful shutdown of the backing executor), then it advances the time on the scheduler, but then it never receives the expected error. It will then time out after 10 seconds and fail the test.

@andrross andrross mentioned this issue Apr 29, 2025
1 task
@andrross andrross added the disabled-test Issues that are used by an AwaitsFix annotation to temporarily disable a broken test label Apr 29, 2025
@reta
Copy link
Contributor

reta commented Apr 30, 2025

When it fails it seems to match the first "onNext" match, then it closes the client (which takes 5 seconds due to the graceful shutdown of the backing executor), then it advances the time on the scheduler, but then it never receives the expected error. It will then time out after 10 seconds and fail the test.

Yeah, the logic seems to be sound but still not stable - I will be looking, sorry it is taking a bit longer

@andrross
Copy link
Member

@reta I had trouble reproducing this, but I can get it to fail in the same way by changing the client close call to be client.close(CloseMode.IMMEDIATE) so that it does not do the graceful shutdown. Not sure if that's helpful.

@andrross
Copy link
Member

New failure here: https://build.ci.opensearch.org/job/gradle-check/58016/

REPRODUCE WITH: ./gradlew ':plugins:transport-reactor-netty4:javaRestTest' --tests "org.opensearch.rest.ReactorNetty4StreamingStressIT.testCloseClientStreamingRequest" -Dtests.seed=9151F2D61088B7C9 -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=de-LU -Dtests.timezone=US/Pacific -Druntime.java=21

ReactorNetty4StreamingStressIT > testCloseClientStreamingRequest FAILED
    java.lang.AssertionError: expectation "expectNextMatches" failed (expected: onNext(); actual: onError(java.util.concurrent.TimeoutException: Did not observe any item or terminal signal within 5000ms in 'flatMapMany' (and no fallback has been configured)))
        at __randomizedtesting.SeedInfo.seed([9151F2D61088B7C9:845DEF9ACFD4FDB]:0)
        at reactor.test.MessageFormatter.assertionError(MessageFormatter.java:115)
        at reactor.test.MessageFormatter.failPrefix(MessageFormatter.java:104)
        at reactor.test.MessageFormatter.fail(MessageFormatter.java:73)
        at reactor.test.MessageFormatter.failOptional(MessageFormatter.java:88)
        at reactor.test.DefaultStepVerifierBuilder.lambda$expectNextMatches$11(DefaultStepVerifierBuilder.java:556)
        at reactor.test.DefaultStepVerifierBuilder$SignalEvent.test(DefaultStepVerifierBuilder.java:2289)
        at reactor.test.DefaultStepVerifierBuilder$DefaultVerifySubscriber.onSignal(DefaultStepVerifierBuilder.java:1529)
        at reactor.test.DefaultStepVerifierBuilder$DefaultVerifySubscriber.onExpectation(DefaultStepVerifierBuilder.java:1477)
        at reactor.test.DefaultStepVerifierBuilder$DefaultVerifySubscriber.onError(DefaultStepVerifierBuilder.java:1129)
        at reactor.core.publisher.FluxMap$MapSubscriber.onError(FluxMap.java:134)
        at reactor.core.publisher.SerializedSubscriber.onError(SerializedSubscriber.java:124)
        at reactor.core.publisher.FluxTimeout$TimeoutMainSubscriber.handleTimeout(FluxTimeout.java:296)
        at reactor.core.publisher.FluxTimeout$TimeoutMainSubscriber.doTimeout(FluxTimeout.java:281)
        at reactor.core.publisher.FluxTimeout$TimeoutTimeoutSubscriber.onNext(FluxTimeout.java:420)
        at reactor.core.publisher.FluxOnErrorReturn$ReturnSubscriber.onNext(FluxOnErrorReturn.java:162)
        at reactor.core.publisher.MonoDelay$MonoDelayRunnable.propagateDelay(MonoDelay.java:270)
        at reactor.core.publisher.MonoDelay$MonoDelayRunnable.run(MonoDelay.java:285)
        at reactor.core.scheduler.SchedulerTask.call(SchedulerTask.java:68)
        at reactor.core.scheduler.SchedulerTask.call(SchedulerTask.java:28)
        at java.****/java.util.concurrent.FutureTask.run(FutureTask.java:317)
        at java.****/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
        at java.****/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
        at java.****/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
        at java.****/java.lang.Thread.run(Thread.java:1583)

@andrross andrross reopened this May 13, 2025
@reta
Copy link
Contributor

reta commented May 14, 2025

New failure here: https://build.ci.opensearch.org/job/gradle-check/58016/

Thanks @andrross , looking into it

@andrross andrross removed untriaged disabled-test Issues that are used by an AwaitsFix annotation to temporarily disable a broken test labels May 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
autocut flaky-test Random test failure that succeeds on second run >test-failure Test failure from CI, local build, etc.
Projects
None yet
3 participants