Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] SegmentReplicationRelocationIT.testPrimaryRelocation flaky test failure #8909

Closed
dreamer-89 opened this issue Jul 27, 2023 · 2 comments
Assignees
Labels
bug Something isn't working flaky-test Random test failure that succeeds on second run Indexing:Replication Issues and PRs related to core replication framework eg segrep

Comments

@dreamer-89
Copy link
Member

Coming from meta tracking on flaky segment replication test failures #8279 (comment)

Gradle check
39 org.opensearch.indices.replication.SegmentReplicationRelocationIT.testPrimaryRelocation (19014,19021,19118,19135,19142,19330,19418,19539,19713,19720,19854,19859,19887,19887,19891,19935,19941,19967,20007,20071,20085,20145,20179,20240,20330,20399,20484,20525,20525,20557,20685,20720,20731,20783,20798,20812,20813,20858,20866)
28 org.opensearch.indices.replication.SegmentReplicationRelocationIT.testPrimaryRelocationWithSegRepFailure (19061,19289,19323,19343,19358,19636,19759,19764,19919,19919,19919,19935,20023,20058,20071,20089,20089,20261,20283,20399,20531,20671,20671,20675,20681,20748,20770,20787)

@dreamer-89 dreamer-89 added bug Something isn't working untriaged labels Jul 27, 2023
@anasalkouz anasalkouz added Indexing:Replication Issues and PRs related to core replication framework eg segrep and removed untriaged labels Aug 8, 2023
@tlfeng
Copy link
Collaborator

tlfeng commented Aug 8, 2023

From console output of build No. 20866 (https://build.ci.opensearch.org/job/gradle-check/20866/)

REPRODUCE WITH: ./gradlew ':server:internalClusterTest' --tests "org.opensearch.indices.replication.SegmentReplicationRelocationIT.testPrimaryRelocation" -Dtests.seed=9601B1CBF34C7C9B -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=lv-LV -Dtests.timezone=Atlantic/Canary -Druntime.java=20

org.opensearch.indices.replication.SegmentReplicationRelocationIT > testPrimaryRelocation FAILED
    java.lang.AssertionError
        at __randomizedtesting.SeedInfo.seed([9601B1CBF34C7C9B:2E098619D0252F75]:0)
        at org.junit.Assert.fail(Assert.java:87)
        at org.junit.Assert.assertTrue(Assert.java:42)
        at org.junit.Assert.assertTrue(Assert.java:53)
        at org.opensearch.indices.replication.SegmentReplicationRelocationIT.lambda$testPrimaryRelocation$0(SegmentReplicationRelocationIT.java:122)
        at org.opensearch.test.OpenSearchTestCase.assertBusy(OpenSearchTestCase.java:1083)
        at org.opensearch.indices.replication.SegmentReplicationRelocationIT.testPrimaryRelocation(SegmentReplicationRelocationIT.java:120)

and In build No. 19014

REPRODUCE WITH: ./gradlew ':server:internalClusterTest' --tests "org.opensearch.indices.replication.SegmentReplicationRelocationIT.testPrimaryRelocation" -Dtests.seed=D8A53D47B5F23713 -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=fr-BE -Dtests.timezone=Pacific/Galapagos -Druntime.java=17

org.opensearch.indices.replication.SegmentReplicationRelocationIT > testPrimaryRelocation FAILED
    java.lang.AssertionError
        at __randomizedtesting.SeedInfo.seed([D8A53D47B5F23713:60AD0A95969B64FD]:0)
        at org.junit.Assert.fail(Assert.java:87)
        at org.junit.Assert.assertTrue(Assert.java:42)
        at org.junit.Assert.assertTrue(Assert.java:53)
        at org.opensearch.indices.replication.SegmentReplicationRelocationIT.lambda$testPrimaryRelocation$0(SegmentReplicationRelocationIT.java:118)
        at org.opensearch.test.OpenSearchTestCase.assertBusy(OpenSearchTestCase.java:1082)
        at org.opensearch.indices.replication.SegmentReplicationRelocationIT.testPrimaryRelocation(SegmentReplicationRelocationIT.java:116)

I found the reason of failure is consistent.

The failed assertion located at https://github.com/opensearch-project/OpenSearch/blob/2.9.0/server/src/internalClusterTest/java/org/opensearch/indices/replication/SegmentReplicationRelocationIT.java#L116

Definition of assertBusy() https://github.com/opensearch-project/OpenSearch/blob/2.9.0/test/framework/src/main/java/org/opensearch/test/OpenSearchTestCase.java#L1063, which is introduced in the commit 10030a6

@tlfeng tlfeng added the flaky-test Random test failure that succeeds on second run label Aug 8, 2023
@tlfeng
Copy link
Collaborator

tlfeng commented Aug 22, 2023

Duplicate with issue #8858 which have been closed 12 hours before this issue created.
Found this during looking at the change history of the test.

Verified no flakiness by running the test locally for 5k times, and no failure occurred.

for i in $(seq 0 5000) ; do echo "Iteration: $i" && ./gradlew ':server:internalClusterTest' --tests "org.opensearch.indices.replication.SegmentReplicationRelocationIT.testPrimaryRelocation" >> ~/test-output-8909-before0726.txt 2>&1 ; done

The latest occurrence in Jenkins CI was build No.20866 which was 1 day before the fix merged.

@tlfeng tlfeng closed this as completed Aug 22, 2023
@github-project-automation github-project-automation bot moved this from Todo to Done in Segment Replication Aug 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working flaky-test Random test failure that succeeds on second run Indexing:Replication Issues and PRs related to core replication framework eg segrep
Projects
Status: Done
Development

No branches or pull requests

3 participants