Skip to content

[BUG] org.opensearch.remotemigration.RemotePrimaryRelocationIT.testMixedModeRelocation_FailInFinalize is flaky #17525

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
finnegancarroll opened this issue Mar 5, 2025 · 1 comment
Labels
bug Something isn't working Storage:Remote untriaged

Comments

@finnegancarroll
Copy link
Contributor

finnegancarroll commented Mar 5, 2025

Describe the bug

org.opensearch.remotemigration.RemotePrimaryRelocationIT.testMixedModeRelocation_FailInFinalize is flaky.

With assertion failure:

java.lang.AssertionError: timed out waiting for relocation
Expected: <false>
     but: was <true>
	at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
	at org.junit.Assert.assertThat(Assert.java:964)
	at org.opensearch.remotemigration.MigrationBaseTestCase.waitForRelocation(MigrationBaseTestCase.java:251)
	at org.opensearch.remotemigration.RemotePrimaryRelocationIT.testMixedModeRelocation_FailInFinalize(RemotePrimaryRelocationIT.java:275)
	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
	at java.base/java.lang.reflect.Method.invoke(Method.java:580)
	at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1750)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:938)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:974)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:988)
	at org.opensearch.test.OpenSearchTestClusterRule$1.evaluate(OpenSearchTestClusterRule.java:369)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at org.junit.rules.RunRules.evaluate(RunRules.java:20)
	at org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
	at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
	at org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
	at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
	at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
	at org.junit.rules.RunRules.evaluate(RunRules.java:20)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:817)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:468)
	at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:947)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:832)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:883)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:894)
	at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
	at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
	at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
	at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
	at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
	at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
	at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
	at org.junit.rules.RunRules.evaluate(RunRules.java:20)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
	at java.base/java.lang.Thread.run(Thread.java:1575)

With stacktrace:

[2025-03-05T15:06:12,681][INFO ][o.o.i.IndexService       ] [node_t1] [test] DocRep shard [test][0] is migrating to remote
[2025-03-05T15:06:13,023][ERROR][o.o.i.s.RemoteStoreRefreshListener] [node_t1] [test][0] Exception while initialising RemoteSegmentStoreDirectory
java.io.IOException: java.nio.file.NoSuchFileException: /var/jenkins/workspace/gradle-check/search/server/build/testrun/internalClusterTest/temp/org.opensearch.remotemigration.RemotePrimaryRelocationIT_B75F2E51C577FA1F-001/tempDir-002/repos/AeCPLrmpIa/hnQqCkKXTS614qWEa5c4Ug/0/segments/metadata
	at org.opensearch.index.store.RemoteDirectory.listFilesByPrefixInLexicographicOrder(RemoteDirectory.java:144) ~[main/:?]
	at org.opensearch.index.store.RemoteSegmentStoreDirectory.readLatestMetadataFile(RemoteSegmentStoreDirectory.java:237) ~[main/:?]
	at org.opensearch.index.store.RemoteSegmentStoreDirectory.init(RemoteSegmentStoreDirectory.java:154) ~[main/:?]
	at org.opensearch.index.shard.RemoteStoreRefreshListener.<init>(RemoteStoreRefreshListener.java:113) [main/:?]
	at org.opensearch.index.shard.IndexShard.newEngineConfig(IndexShard.java:4035) [main/:?]
	at org.opensearch.index.shard.IndexShard.innerOpenEngineAndTranslog(IndexShard.java:2584) [main/:?]
	at org.opensearch.index.shard.IndexShard.openEngineAndSkipTranslogRecovery(IndexShard.java:2568) [main/:?]
	at org.opensearch.index.shard.IndexShard.openEngineAndSkipTranslogRecovery(IndexShard.java:2553) [main/:?]
	at org.opensearch.indices.recovery.RecoveryTarget.lambda$prepareForTranslogOperations$2(RecoveryTarget.java:224) [main/:?]
	at org.opensearch.core.action.ActionListener.completeWith(ActionListener.java:344) [opensearch-core-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.indices.recovery.RecoveryTarget.prepareForTranslogOperations(RecoveryTarget.java:213) [main/:?]
	at org.opensearch.indices.recovery.PeerRecoveryTargetService$PrepareForTranslogOperationsRequestHandler.messageReceived(PeerRecoveryTargetService.java:405) [main/:?]
	at org.opensearch.indices.recovery.PeerRecoveryTargetService$PrepareForTranslogOperationsRequestHandler.messageReceived(PeerRecoveryTargetService.java:394) [main/:?]
	at org.opensearch.wlm.WorkloadManagementTransportInterceptor$RequestHandler.messageReceived(WorkloadManagementTransportInterceptor.java:63) [main/:?]
	at org.opensearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:108) [main/:?]
	at org.opensearch.transport.NativeMessageHandler$RequestHandler.doRun(NativeMessageHandler.java:487) [main/:?]
	at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:994) [main/:?]
	at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) [main/:?]
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) [?:?]
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) [?:?]
	at java.base/java.lang.Thread.run(Thread.java:1575) [?:?]
Caused by: java.nio.file.NoSuchFileException: /var/jenkins/workspace/gradle-check/search/server/build/testrun/internalClusterTest/temp/org.opensearch.remotemigration.RemotePrimaryRelocationIT_B75F2E51C577FA1F-001/tempDir-002/repos/AeCPLrmpIa/hnQqCkKXTS614qWEa5c4Ug/0/segments/metadata
	at java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:92) ~[?:?]
	at java.base/sun.nio.fs.UnixException.asIOException(UnixException.java:115) ~[?:?]
	at java.base/sun.nio.fs.UnixFileSystemProvider.newDirectoryStream(UnixFileSystemProvider.java:502) ~[?:?]
	at org.apache.lucene.tests.mockfile.FilterFileSystemProvider.newDirectoryStream(FilterFileSystemProvider.java:236) ~[lucene-test-framework-10.1.0.jar:10.1.0 884954006de769dc43b811267230d625886e6515 - 2024-12-17 16:15:44]
	at org.apache.lucene.tests.mockfile.FilterFileSystemProvider.newDirectoryStream(FilterFileSystemProvider.java:236) ~[lucene-test-framework-10.1.0.jar:10.1.0 884954006de769dc43b811267230d625886e6515 - 2024-12-17 16:15:44]
	at org.apache.lucene.tests.mockfile.ShuffleFS.newDirectoryStream(ShuffleFS.java:48) ~[lucene-test-framework-10.1.0.jar:10.1.0 884954006de769dc43b811267230d625886e6515 - 2024-12-17 16:15:44]
	at org.apache.lucene.tests.mockfile.HandleTrackingFS.newDirectoryStream(HandleTrackingFS.java:299) ~[lucene-test-framework-10.1.0.jar:10.1.0 884954006de769dc43b811267230d625886e6515 - 2024-12-17 16:15:44]
	at org.apache.lucene.tests.mockfile.HandleTrackingFS.newDirectoryStream(HandleTrackingFS.java:299) ~[lucene-test-framework-10.1.0.jar:10.1.0 884954006de769dc43b811267230d625886e6515 - 2024-12-17 16:15:44]
	at java.base/java.nio.file.Files.newDirectoryStream(Files.java:550) ~[?:?]
	at org.opensearch.common.blobstore.fs.FsBlobContainer.listBlobsByPrefix(FsBlobContainer.java:116) ~[main/:?]
	at org.opensearch.common.blobstore.BlobContainer.listBlobsByPrefixInSortedOrder(BlobContainer.java:323) ~[main/:?]
	at org.opensearch.common.blobstore.BlobContainer.listBlobsByPrefixInSortedOrder(BlobContainer.java:312) ~[main/:?]
	at org.opensearch.index.store.RemoteDirectory.listFilesByPrefixInLexicographicOrder(RemoteDirectory.java:133) ~[main/:?]
	... 20 more

Related component

Storage:Remote

To Reproduce

Command for above failure:

./gradlew ':server:internalClusterTest' --tests "org.opensearch.remotemigration.RemotePrimaryRelocationIT.testMixedModeRelocation_FailInFinalize" -Dtests.seed=B75F2E51C577FA1F -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=gl-ES -Dtests.timezone=America/Lima -Druntime.java=23

Expected behavior

Test always passes

@andrross
Copy link
Member

andrross commented Mar 6, 2025

Existing autocut issue is here: #17364

I'm going to close this. @finnegancarroll please reopen if this is needed along with the autocut issue.

@andrross andrross closed this as completed Mar 6, 2025
@github-project-automation github-project-automation bot moved this from 🆕 New to ✅ Done in Storage Project Board Mar 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Storage:Remote untriaged
Projects
Status: ✅ Done
Development

No branches or pull requests

2 participants