Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Shard Level Snapshot Restore #12254

Open
linuxpi opened this issue Feb 8, 2024 · 0 comments
Open

[Feature Request] Shard Level Snapshot Restore #12254

linuxpi opened this issue Feb 8, 2024 · 0 comments
Labels
enhancement Enhancement or improvement to existing feature or request Storage:Snapshots Storage Issues and PRs relating to data and metadata storage

Comments

@linuxpi
Copy link
Collaborator

linuxpi commented Feb 8, 2024

Is your feature request related to a problem? Please describe

  • During snapshot restore, individual shards can fail during restore, leading to red index.
  • Although the index is red, other primaries which were able to restore successfully can still accept write and move ahead of the snapshot point in time.
  • Since one of the shards is still UNASSIGNED, which failed recovery, and is rejecting any writes
  • Today if the user wants to recover from this state, they have no other option than to DELETE the index and restore from snapshot again.
  • This leads to data loss as some of the shards, which were STARTED, already started accepting traffic

Describe the solution you'd like

  • During Snapshot Restore if only some of the shards have failed, we should allow restoring individual shards
  • This will allow user to trigger Snapshot Restore on the same index again and only the UNASSIGNED(failed) shards will start recovery again from scratch.
  • This prevent data loss if successfully recovered shards have accepted any writes and reduces time and effort to recover.

Related component

Storage:Snapshots

Describe alternatives you've considered

No response

Additional context

We recently saw this issue with a Remote Store enabled domain where during snapshot recovery uploads to remote store started to fail for a single shard which lead to 1 out of 5 shards to fail recovery

@linuxpi linuxpi added enhancement Enhancement or improvement to existing feature or request untriaged labels Feb 8, 2024
@linuxpi linuxpi added Storage Issues and PRs relating to data and metadata storage untriaged and removed untriaged labels Feb 8, 2024
@rramachand21 rramachand21 moved this from 🆕 New to Ready To Be Picked in Storage Project Board Mar 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request Storage:Snapshots Storage Issues and PRs relating to data and metadata storage
Projects
Status: Ready To Be Picked
Development

No branches or pull requests

2 participants