-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce created segments when there is low traffic on a shard #618
Comments
Can you more clearly define the problem that is solved by reducing the number of segments that are created on low traffic shards? What is the use case that optimization is targeting? |
Closing it since we didn't receive a response for a while. @itiyamas feel free to reopen once you have more details about it. |
@anasalkouz I do not have permissions to re-open the thread. Can you please open this?
|
@dreamer-89 Can you help re-open this? |
Issue re-opened. |
Fixed by apache/lucene#921 |
@msfroh The issue that you linked solves the problem of searching through more segments but utilizes more resources to do merging. The proposal here is to create fewer segments in the first place so that you do not spend resources on merging later. |
Ahh... okay. I guess we can reopen it. It feels like a pretty low priority, though, since writing and then merging small segments will (or at least should?) have negligible impact on overall performance. If/when we move to predominantly pull-based indexing, we can allocate indexing threads per node based on the number of pending documents. (If each shard writes with one thread or less, then each shard will only ever write one segment per flush.) |
Lucene creates 1 segment per active concurrent thread per shard. Number of active concurrent threads per shard is determined by bulk thread pool in ES, which is a worker on the queue holding sub bulk requests. Each sub-bulk request ends up being picked by a different thread, hence resulting in multiple segments during refresh, resulting in better performance during reads. When there is a low traffic on a particular shard on a node, we can potentially reduce the number of created segments by co-alescing bulk requests.
When there are lot of shards on a single node for different indices, this problem may aggravate further.
The proposal is to optimize this entire process is via the following tasks:
The text was updated successfully, but these errors were encountered: