Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Shallow copy snapshot failures on closed index #16868

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

astute-decipher
Copy link
Contributor

@astute-decipher astute-decipher commented Dec 17, 2024

Description

We observed for a remote-store backed index, if the index get’s closed, the shallow snapshot fails for the shard with error :
java.nio.file.NoSuchFileException: Metadata file is not present for given primary term <X> and generation <Y> .
On root causing the issue we found :

  • There is difference in last segment generation on local node directory & remote store directory. Where remote store lags by 1 generation.
  • The shallow_v1 snapshot tries to find the latest segment generation on remote store, which was failing since it never got uploaded.
  • The last segment_N file while closing the index got uploaded to the remote store.
  • But post successful close, we open a read_only engine for the index, which performs the recovery and creates a new segment_N file, but since it will not be having any refresh_listener available the new file will not get uploaded to remote store ever.

Approach :

  • Take snapshot with last successfully uploaded segment generation
  • We fetch the latest metadata file from remote directory and take lock on that commit generation.

Related Issues

Resolves [#13805]

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link
Contributor

❌ Gradle check result for 88c3280: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@astute-decipher astute-decipher changed the title Fix shallow v1 snapshot failures on closed index Fix Shallow copy snapshot failures on closed index Dec 17, 2024
@astute-decipher astute-decipher self-assigned this Dec 17, 2024
Copy link
Contributor

❕ Gradle check result for e4fd52b: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

Copy link

codecov bot commented Dec 17, 2024

Codecov Report

Attention: Patch coverage is 59.61538% with 21 lines in your changes missing coverage. Please review.

Project coverage is 72.07%. Comparing base (b5f651f) to head (b0ea945).

Files with missing lines Patch % Lines
...rg/opensearch/snapshots/SnapshotShardsService.java 55.17% 13 Missing ⚠️
...ch/repositories/blobstore/BlobStoreRepository.java 53.84% 4 Missing and 2 partials ⚠️
...in/java/org/opensearch/index/shard/IndexShard.java 80.00% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #16868      +/-   ##
============================================
- Coverage     72.21%   72.07%   -0.14%     
+ Complexity    65335    65222     -113     
============================================
  Files          5318     5318              
  Lines        304081   304110      +29     
  Branches      43995    44001       +6     
============================================
- Hits         219578   219179     -399     
- Misses        66541    66973     +432     
+ Partials      17962    17958       -4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Shubh Sahu added 3 commits December 19, 2024 14:28
Signed-off-by: Shubh Sahu <[email protected]>
Signed-off-by: Shubh Sahu <[email protected]>
Copy link
Contributor

✅ Gradle check result for b0ea945: SUCCESS

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant