[RFC] Explore auto upgrade of indices (segments) across major versions as alternative to reindexing #13291
Labels
enhancement
Enhancement or improvement to existing feature or request
feature
New feature or request
Indexing
Indexing, Bulk Indexing and anything related to indexing
RFC
Issues requesting major changes
Roadmap:Stability/Availability/Resiliency
Project-wide roadmap label
Is your feature request related to a problem? Please describe
OpenSearch indices are backward compatible with last major version only and same applies to lucene as well. This means any index created in OpenSearch 1.x is compatible when cluster is upgraded to OpenSearch 2.x. But, when OpenSearch 3.0 will be launched, all the OpenSearch 1.x indices have to be first re-indexed and then only can migrate to OoenSearch 3.0.
Reindexing is known to be a generally a slow process at it first runs the search and then runs bulk indexing operation. It ends up parsing the document and then replicating it again and go through segment merges, so overall this becomes very resource intensive operation and even more for large cluster. Another overhead is that it requires _source field to enabled in the index which increases the index storage significantly.
Describe the solution you'd like
The proposal is to explore auto upgrade of the lucene segments via pseudo merge in the background using next major version lucene IndexWriter. These merges can be configured using new merge policy for upgrade. This will ensure when user upgrades the cluster from OpenSearch 1.x to OpenSearch 2.x, all the indices are auto upgraded via the configured merge policy. This will ensure user need not perform any re-indexing (major blocker for upgrades) and continue upgrading to next major version without any un-necessary one time activity.
This idea was briefly discussed #12667 . Also thanks @itiyamas, it came up in one of the discussions with her.
I will spend sometime creating a POC and share more details on it.
Looking for feedback and more ideas on this!
Related component
Indexing
Describe alternatives you've considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: