-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] improper shard distribution in search nodes after scaling. #14747
Comments
Adding new nodes will not necessarily move shards from the older nodes to the new nodes to rebalance. This is because moving shards is essentially a peer recovery activity and consumes resources that could be critical to traffic. Lucene shards tend to be sticky in nature and will stay on a node as long as there are no violations. When users add nodes, it might be due to the cluster already having a high shard count or disk usage. In a scenario for the latter, adding new nodes will cause any old nodes exceeding threshold to relocate the shards to nodes with more disk space (in this case, the new nodes). A simple way to achieve a balanced cluster after adding new nodes would be to explictly set the There has been a discussion of making this automatic, but the competing resources of shard migration is why this has not been automated. A monitoring system, however, could look at (Closing this as it is working as designed) |
Also for Searchable Snapshots specifically, we balance only by average primary shard count: Lines 232 to 267 in fcc231d
|
Describe the bug
When adding search nodes to existing cluster, ideal expectation is all search nodes having equal number of shards, but this is not happening. We had to delete all the indices and restore all at once in order to get the equal distribution, Which implies availability of overall service
Related component
Search:Searchable Snapshots
To Reproduce
Expected behavior
After adding extra search nodes, all search nodes should have equal number of shards.
Additional Details
Opensearch
version: 2.13
The text was updated successfully, but these errors were encountered: