Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[META] In-place Shard Splitting #13254

Open
2 of 9 tasks
vikasvb90 opened this issue Apr 17, 2024 · 1 comment
Open
2 of 9 tasks

[META] In-place Shard Splitting #13254

vikasvb90 opened this issue Apr 17, 2024 · 1 comment
Assignees
Labels
Meta Meta issue, not directly linked to a PR Roadmap:Cost/Performance/Scale Project-wide roadmap label Scale ShardManagement:Routing ShardManagement:Sizing

Comments

@vikasvb90
Copy link
Contributor

vikasvb90 commented Apr 17, 2024

Please describe the end goal of this project

In RFC #12918, we proposed to build shard level splitting without the need to stop write traffic on the cluster. This feature is aimed at solving hot shard or large shard problem which arises mostly in search workload where data is never rolled over and supposed to be available at all times in the same index. Large shards in such workloads is caused typically by usage of custom doc ids by users causing uneven shard sizes or even but large shards on nodes. Shards of the index in such cases continue to grow and eventually become too hot/large to be hosted on the same node. This leads to scaling bottlenecks and therefore, this meta issue defines an overall project plan to dial down on high level tasks required to build the solution of these problems.

Supporting References

RFC : In-place Shard Splitting

Issues

Following is the high level break up of tasks of the project. I will keep linking github issues and PRs as and when they are published.

  • [Design Proposal] Design of In-place shard splitting
  • [Design Proposal] Doc routing algorithm to handle routing docs to respective child shards - With new child shards, there will be changes in the routing algorithm to start routing docs to right child shards.
  • Shard routing changes and new allocation decider - Involves cluster state changes for creating new shard routings for child shards, new allocation decider to support the same, new rest and transport actions, request validations, validation based on existing allocation deciders, etc.
  • Building online recovery of in-place shard split - New recovery flow which would consist of creating child shards, replay of translog operations, filtered replication of docs, parent primary to child shards handoff, etc.
  • Disk, max shards per node/index, awareness validation changes
  • Build cancellation of on-going split recovery flow
  • Allocation explain, shard level info, shard stats and recovery info changes
  • Ensuring split request failure in mixed cluster and other blocking operations - Block actions like change replica count on the source shard, re-triggered online split, blocks, offline resize, etc while online split in progress. Block adding an external retention lease during in-progress split. Split request to fail while any of these operations are in progress.
  • Online Recovery of replicas of child primary shards after split - This may be pushed to phase 2 and phase 1 may involve just splitting source primary and failing parent replicas and letting them recover later after child shards are started.

Related component

Indexing

@peternied
Copy link
Member

[Triage - attendees 1 2 3 4 5 6 7]
@vikasvb90 Thanks for creating this RFC

@github-project-automation github-project-automation bot moved this to Planned work items in OpenSearch Roadmap May 31, 2024
@andrross andrross added the Roadmap:Cost/Performance/Scale Project-wide roadmap label label May 31, 2024
@prudhvigodithi prudhvigodithi added Meta Meta issue, not directly linked to a PR ShardManagement:Routing ShardManagement:Sizing Scale and removed Meta Meta issue, not directly linked to a PR ShardManagement:Routing ShardManagement:Sizing Scale labels Jun 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Meta Meta issue, not directly linked to a PR Roadmap:Cost/Performance/Scale Project-wide roadmap label Scale ShardManagement:Routing ShardManagement:Sizing
Projects
Status: New
Status: 🆕 New
Development

No branches or pull requests

4 participants