Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Remote Store] Add extra buffer before deleting older generations of translog #10817

Merged
merged 3 commits into from
Oct 23, 2023

Conversation

gbbafna
Copy link
Collaborator

@gbbafna gbbafna commented Oct 21, 2023

Description

Even after #9191 , we are seeing recovery failures on primary relocations.

This is due to the fact that older primary is continuously uploading and deleting from remote translog. So it can happen that even with retries , newer primary is not able to complete the download of all the files.

This PR adds some buffer before deleting the files from remote translog .

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Failing checks are inspected and point to the corresponding known issue(s) (See: Troubleshooting Failing Builds)
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)
  • Public documentation issue/PR created

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

github-actions bot commented Oct 21, 2023

Compatibility status:

Checks if related components are compatible with change c411a25

Incompatible components

Incompatible components: [https://github.com/opensearch-project/cross-cluster-replication.git]

Skipped components

Compatible components

Compatible components: [https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/custom-codecs.git, https://github.com/opensearch-project/performance-analyzer.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/neural-search.git]

@gbbafna gbbafna force-pushed the extra-gen branch 2 times, most recently from dd548fb to 8d45cd2 Compare October 21, 2023 16:44
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@codecov
Copy link

codecov bot commented Oct 21, 2023

Codecov Report

Merging #10817 (c411a25) into main (51626d0) will decrease coverage by 0.05%.
Report is 9 commits behind head on main.
The diff coverage is 90.05%.

@@             Coverage Diff              @@
##               main   #10817      +/-   ##
============================================
- Coverage     71.31%   71.27%   -0.05%     
- Complexity    58671    58706      +35     
============================================
  Files          4860     4869       +9     
  Lines        276335   276450     +115     
  Branches      40198    40198              
============================================
- Hits         197068   197032      -36     
- Misses        62803    62991     +188     
+ Partials      16464    16427      -37     
Files Coverage Δ
...upport/replication/TransportReplicationAction.java 77.99% <100.00%> (-2.69%) ⬇️
...ava/org/opensearch/cluster/node/DiscoveryNode.java 91.62% <100.00%> (+0.17%) ⬆️
...a/org/opensearch/common/network/NetworkModule.java 92.20% <100.00%> (+0.20%) ⬆️
...rg/opensearch/common/settings/ClusterSettings.java 92.85% <ø> (ø)
...pensearch/common/settings/IndexScopedSettings.java 100.00% <ø> (ø)
.../java/org/opensearch/gateway/GatewayMetaState.java 69.25% <100.00%> (+0.73%) ⬆️
...earch/index/remote/RemoteStorePressureService.java 100.00% <ø> (ø)
...rg/opensearch/index/translog/RemoteFsTranslog.java 75.00% <100.00%> (+0.50%) ⬆️
server/src/main/java/org/opensearch/node/Node.java 85.31% <100.00%> (+0.09%) ⬆️
...ting/admissioncontrol/AdmissionControlService.java 100.00% <100.00%> (ø)
... and 16 more

... and 461 files with indirect coverage changes

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@gbbafna gbbafna changed the title Adding extra buffer before deleting older generations of translog [Remote Store] Adding extra buffer before deleting older generations of translog Oct 22, 2023
@ashking94
Copy link
Member

Lets create an issue or tag an existing one around using an approach where we can prevent deletion of translog from remote only for the peer recovery case.

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      1 org.opensearch.smoketest.SmokeTestMultiNodeClientYamlTestSuiteIT.test {yaml=pit/10_basic/Delete all}

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@gbbafna gbbafna added the backport 2.x Backport to 2.x branch label Oct 22, 2023
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

Copy link
Member

@ashking94 ashking94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@sachinpkale sachinpkale changed the title [Remote Store] Adding extra buffer before deleting older generations of translog [Remote Store] Add extra buffer before deleting older generations of translog Oct 23, 2023
Signed-off-by: Gaurav Bafna <[email protected]>
Signed-off-by: Gaurav Bafna <[email protected]>
@ashking94
Copy link
Member

@gbbafna @sachinpkale I have rebased and forced push to this branch for build to succeed.

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@sachinpkale sachinpkale merged commit 218a2ef into opensearch-project:main Oct 23, 2023
16 checks passed
opensearch-trigger-bot bot pushed a commit that referenced this pull request Oct 23, 2023
…translog (#10817)

---------

Signed-off-by: Gaurav Bafna <[email protected]>
(cherry picked from commit 218a2ef)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
sachinpkale pushed a commit that referenced this pull request Oct 23, 2023
…translog (#10817) (#10850)

---------


(cherry picked from commit 218a2ef)

Signed-off-by: Gaurav Bafna <[email protected]>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
shiv0408 pushed a commit to Gaurav614/OpenSearch that referenced this pull request Apr 25, 2024
…translog (opensearch-project#10817)

---------

Signed-off-by: Gaurav Bafna <[email protected]>
Signed-off-by: Shivansh Arora <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch skip-changelog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants