Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] org.opensearch.index.shard.RemoteIndexShardTests.testNoFailuresOnFileReads #10945

Closed
shwetathareja opened this issue Oct 26, 2023 · 7 comments · Fixed by #11824
Closed
Assignees
Labels
bug Something isn't working flaky-test Random test failure that succeeds on second run Storage:Remote

Comments

@shwetathareja
Copy link
Member

Describe the bug

It say test passed here - https://build.ci.opensearch.org/job/gradle-check/29071/testReport/junit/org.opensearch.index.shard/RemoteIndexShardTests/testNoFailuresOnFileReads/

But shows up as failed in gradle check
https://build.ci.opensearch.org/job/gradle-check/29071/

Seeing the below exception in Standard output

[2023-10-26T08:50:15,190][ERROR][o.o.i.s.IndexShard       ] [testNoFailuresOnFileReads] [test][0] skip local recovery as no index commit found
[2023-10-26T08:50:15,503][INFO ][o.o.i.r.RecoverySourceHandler] [org.opensearch.index.shard.RemoteIndexShardTests] [test][0][recover to s1] finalizing recovery took [222.3ms]
[2023-10-26T08:50:22,157][INFO ][o.o.i.s.RemoteIndexShardTests] [testNoFailuresOnFileReads] --> Corrupting file _0.cfe
[2023-10-26T08:50:22,159][INFO ][o.o.t.CorruptionUtils    ] [testNoFailuresOnFileReads] Corrupting file --  flipping at position 397 from 0 to 1 file: _0.cfe
[2023-10-26T08:50:28,469][WARN ][o.o.i.r.SegmentReplicationTarget] [org.opensearch.index.shard.RemoteIndexShardTests] [test][0] Error reading name [_0.cfe], length [405], checksum [6e7t5f], writtenBy [9.8.0]
org.apache.lucene.index.CorruptIndexException: Illegal CRC-32 checksum: 72057594424603987 (resource=MockIndexInputWrapper(NIOFSIndexInput(path="/var/jenkins/workspace/gradle-check/search/server/build/testrun/test/temp/org.opensearch.index.shard.RemoteIndexShardTests_A71542B40F44C18D-002/tempDir-006/indices/uuid/0/index/_0.cfe")))
	at org.apache.lucene.codecs.CodecUtil.readCRC(CodecUtil.java:631) ~[lucene-core-9.8.0.jar:9.8.0 d914b3722bd5b8ef31ccf7e8ddc638a87fd648db - 2023-09-21 21:57:47]
	at org.apache.lucene.codecs.CodecUtil.retrieveChecksum(CodecUtil.java:535) ~[lucene-core-9.8.0.jar:9.8.0 d914b3722bd5b8ef31ccf7e8ddc638a87fd648db - 2023-09-21 21:57:47]
	at org.opensearch.indices.replication.SegmentReplicationTarget.validateLocalChecksum(SegmentReplicationTarget.java:238) [main/:?]
	at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:178) [?:?]
	at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:179) [?:?]
	at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1625) [?:?]
	at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509) [?:?]
	at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499) [?:?]
	at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:921) [?:?]
	at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) [?:?]
	at java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:682) [?:?]
	at org.opensearch.indices.replication.SegmentReplicationTarget.getFiles(SegmentReplicationTarget.java:200) [main/:?]
	at org.opensearch.indices.replication.SegmentReplicationTarget.lambda$startReplication$2(SegmentReplicationTarget.java:170) [main/:?]
	at org.opensearch.core.action.ActionListener$1.onResponse(ActionListener.java:82) [opensearch-core-2.12.0-SNAPSHOT.jar:2.12.0-SNAPSHOT]
	at org.opensearch.common.util.concurrent.ListenableFuture$1.doRun(ListenableFuture.java:126) [main/:?]
	at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) [main/:?]
	at org.opensearch.common.util.concurrent.OpenSearchExecutors$DirectExecutorService.execute(OpenSearchExecutors.java:343) [main/:?]
	at org.opensearch.common.util.concurrent.ListenableFuture.notifyListener(ListenableFuture.java:120) [main/:?]
	at org.opensearch.common.util.concurrent.ListenableFuture.addListener(ListenableFuture.java:82) [main/:?]
	at org.opensearch.action.StepListener.whenComplete(StepListener.java:95) [main/:?]
	at org.opensearch.indices.replication.SegmentReplicationTarget.startReplication(SegmentReplicationTarget.java:169) [main/:?]
	at org.opensearch.indices.replication.SegmentReplicationTargetService.start(SegmentReplicationTargetService.java:515) [main/:?]
	at org.opensearch.indices.replication.SegmentReplicationTargetService$ReplicationRunner.doRun(SegmentReplicationTargetService.java:501) [main/:?]
	at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:908) [main/:?]
	at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) [main/:?]
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
	at java.base/java.lang.Thread.run(Thread.java:833) [?:?]
[2023-10-26T08:50:28,834][INFO ][o.o.i.s.RemoteIndexShardTests] [testNoFailuresOnFileReads] after test
@shwetathareja shwetathareja added bug Something isn't working untriaged labels Oct 26, 2023
@shwetathareja
Copy link
Member Author

@dreamer-89 : Can you please take a look on priority? You newly added this test and it is failing - #10933

@shwetathareja
Copy link
Member Author

#10944 (comment)

@dreamer-89
Copy link
Member

Looking into it

@dreamer-89
Copy link
Member

dreamer-89 commented Jan 8, 2024

The test fails while asserting there are at least one file which is corrupted here.

java.lang.AssertionError
	at __randomizedtesting.SeedInfo.seed([A71542B40F44C18D:825D5DA2CC1735C3]:0)
	at org.junit.Assert.fail(Assert.java:87)
	at org.junit.Assert.assertTrue(Assert.java:42)
	at org.junit.Assert.assertNotNull(Assert.java:713)
	at org.junit.Assert.assertNotNull(Assert.java:723)
	at org.opensearch.index.shard.RemoteIndexShardTests.testNoFailuresOnFileReads(RemoteIndexShardTests.java:512)
...

@harishbhakuni
Copy link
Contributor

still seeing this test failing as part of this backport PR: #12193 (org.opensearch.index.shard.RemoteIndexShardTests.testNoFailuresOnFileReads)

@andrross
Copy link
Member

[Triage - attendees 1 2 3]
The backport PR with the fix for this issue is stalled and has not been merged: #11806

@Bukhtawar
Copy link
Collaborator

Closing in favour of #11806

@github-project-automation github-project-automation bot moved this from 🆕 New to ✅ Done in Storage Project Board May 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working flaky-test Random test failure that succeeds on second run Storage:Remote
Projects
Status: ✅ Done
Development

Successfully merging a pull request may close this issue.

5 participants