Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] org.opensearch.remotestore.RemoteIndexRecoveryIT.testSnapshotRecovery is flaky #10758

Closed
reta opened this issue Oct 19, 2023 · 6 comments
Closed
Assignees
Labels
bug Something isn't working flaky-test Random test failure that succeeds on second run Storage:Remote

Comments

@reta
Copy link
Collaborator

reta commented Oct 19, 2023

Describe the bug
The test case org.opensearch.remotestore.RemoteIndexRecoveryIT.testSnapshotRecovery is flaky:

java.lang.AssertionError: max seq. no. [141] does not match [0]

java.lang.AssertionError: max seq. no. [141] does not match [0]
	at __randomizedtesting.SeedInfo.seed([C35CC260A48CEB4C]:0)
	at org.opensearch.index.engine.ReadOnlyEngine.assertMaxSeqNoEqualsToGlobalCheckpoint(ReadOnlyEngine.java:202)
	at org.opensearch.index.engine.ReadOnlyEngine.ensureMaxSeqNoEqualsToGlobalCheckpoint(ReadOnlyEngine.java:189)
	at org.opensearch.index.engine.ReadOnlyEngine.<init>(ReadOnlyEngine.java:140)
	at org.opensearch.index.engine.NoOpEngine.<init>(NoOpEngine.java:76)
	at org.opensearch.index.shard.IndexShard.innerOpenEngineAndTranslog(IndexShard.java:2412)
	at org.opensearch.index.shard.IndexShard.innerOpenEngineAndTranslog(IndexShard.java:2354)
	at org.opensearch.index.shard.IndexShard.openEngineAndRecoverFromTranslog(IndexShard.java:2323)
	at org.opensearch.index.shard.StoreRecovery.internalRecoverFromStore(StoreRecovery.java:640)
	at org.opensearch.index.shard.StoreRecovery.lambda$recoverFromStore$0(StoreRecovery.java:121)
	at org.opensearch.core.action.ActionListener.completeWith(ActionListener.java:344)
	at org.opensearch.index.shard.StoreRecovery.recoverFromStore(StoreRecovery.java:119)
	at org.opensearch.index.shard.IndexShard.recoverFromStore(IndexShard.java:2697)
	at org.opensearch.action.ActionRunnable$2.doRun(ActionRunnable.java:89)
	at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:908)
	at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
	at java.base/java.lang.Thread.run(Thread.java:1623)

To Reproduce

./gradlew ':server:internalClusterTest' --tests "org.opensearch.remotestore.RemoteIndexRecoveryIT.testSnapshotRecovery" -Dtests.seed=C35CC260A48CEB4C

Expected behavior
The test must always pass

Plugins
Standard

Screenshots
If applicable, add screenshots to help explain your problem.

Host/Environment (please complete the following information):

  • CI

Additional context

@peternied
Copy link
Member

Responsible for failure of #11048

@peternied
Copy link
Member

Another failure, impacted #10869

@gbbafna
Copy link
Collaborator

gbbafna commented Jan 9, 2024

@BhumikaSaini-Amazon , can you please take this up ?

@BhumikaSaini-Amazon
Copy link
Contributor

56 actionable tasks: 1 executed, 55 up-to-date                   │
=================================================================│
==                                                               │
2667                                                             │
=================================================================│
==                                                               │
=======================================                          │
OpenSearch Build Hamster says Hello!                             │
  Gradle Version        : 8.6                                    │
  OS Info               : <redacted>                             │
  JDK Version           : 11 (Amazon Corretto JDK)               │
  JAVA_HOME             : /usr/lib/jvm/java-11-amazon-corretto   │
  Random Testing Seed   : BA81AC90D7D02EEC                       │
  In FIPS 140 mode      : false                                  │
=======================================                          │
                                          

Unable to repro this even after 2k+ unique master seeds

@BhumikaSaini-Amazon
Copy link
Contributor

...

WARNING: System::setSecurityManager will be removed in a future r│
elease                                                           │
                                                                 │
BUILD SUCCESSFUL in 41s                                          │
56 actionable tasks: 1 executed, 55 up-to-date                   │
=================================================================│
==                                                               │
5965                                                             │
=================================================================│
==                                                               │
=======================================                          │
OpenSearch Build Hamster says Hello!                             │
  Gradle Version        : 8.7                                    │
  OS Info               : <redacted>                             │
  JDK Version           : 11 (Amazon Corretto JDK)               │
  JAVA_HOME             : /usr/lib/jvm/java-11-amazon-corretto   │
  Random Testing Seed   : 8F0BB948C3066A53                       │
  In FIPS 140 mode      : false                                  │
=======================================                          │
                                                                 │
> Task :server:internalClusterTest                               │
Apr 01, 2024 10:43:56 AM sun.util.locale.provider.LocaleProviderA│
dapter <clinit>                                                  │
WARNING: COMPAT locale provider will be removed in a future relea│
se                                                               │
                                                                 │
WARNING: A terminally deprecated method in java.lang.System has b│

...

Unable to repro this on the latest main branch even after 5k+ unique master seeds. I don't see any recent reports for this issue either.

I will check if this could be somehow related to concurrent runs in Jenkins.

@BhumikaSaini-Amazon
Copy link
Contributor

WARNING: System::setSecurityManager will be removed in a future r│
elease                                                           │
                                                                 │
BUILD SUCCESSFUL in 1m 1s                                        │
56 actionable tasks: 1 executed, 55 up-to-date                   │
=================================================================│
==                                                               │
8602                                                             │
=================================================================│
==                                                               │
=======================================                          │
OpenSearch Build Hamster says Hello!

No repro event after 8k+ iterations. I don't see any concerning traces in the Jenkins logs.
Resolving this for now.

@BhumikaSaini-Amazon BhumikaSaini-Amazon closed this as not planned Won't fix, can't repro, duplicate, stale Apr 5, 2024
@github-project-automation github-project-automation bot moved this from 🏗 In progress to ✅ Done in Storage Project Board Apr 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working flaky-test Random test failure that succeeds on second run Storage:Remote
Projects
Status: ✅ Done
Development

No branches or pull requests

7 participants