Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Explore auto upgrade of indices (segments) across major versions as alternative to reindexing #13291

Open
shwetathareja opened this issue Apr 18, 2024 · 0 comments
Labels
enhancement Enhancement or improvement to existing feature or request feature New feature or request Indexing Indexing, Bulk Indexing and anything related to indexing RFC Issues requesting major changes Roadmap:Stability/Availability/Resiliency Project-wide roadmap label

Comments

@shwetathareja
Copy link
Member

shwetathareja commented Apr 18, 2024

Is your feature request related to a problem? Please describe

OpenSearch indices are backward compatible with last major version only and same applies to lucene as well. This means any index created in OpenSearch 1.x is compatible when cluster is upgraded to OpenSearch 2.x. But, when OpenSearch 3.0 will be launched, all the OpenSearch 1.x indices have to be first re-indexed and then only can migrate to OoenSearch 3.0.
Reindexing is known to be a generally a slow process at it first runs the search and then runs bulk indexing operation. It ends up parsing the document and then replicating it again and go through segment merges, so overall this becomes very resource intensive operation and even more for large cluster. Another overhead is that it requires _source field to enabled in the index which increases the index storage significantly.

Describe the solution you'd like

The proposal is to explore auto upgrade of the lucene segments via pseudo merge in the background using next major version lucene IndexWriter. These merges can be configured using new merge policy for upgrade. This will ensure when user upgrades the cluster from OpenSearch 1.x to OpenSearch 2.x, all the indices are auto upgraded via the configured merge policy. This will ensure user need not perform any re-indexing (major blocker for upgrades) and continue upgrading to next major version without any un-necessary one time activity.

This idea was briefly discussed #12667 . Also thanks @itiyamas, it came up in one of the discussions with her.

I will spend sometime creating a POC and share more details on it.

Looking for feedback and more ideas on this!

Related component

Indexing

Describe alternatives you've considered

No response

Additional context

No response

@shwetathareja shwetathareja added enhancement Enhancement or improvement to existing feature or request untriaged RFC Issues requesting major changes Indexing Indexing, Bulk Indexing and anything related to indexing and removed untriaged labels Apr 18, 2024
@shwetathareja shwetathareja added feature New feature or request and removed untriaged labels Apr 18, 2024
@andrross andrross added the Roadmap:Stability/Availability/Resiliency Project-wide roadmap label label May 14, 2024
@github-project-automation github-project-automation bot moved this to Planned work items in OpenSearch Roadmap May 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request feature New feature or request Indexing Indexing, Bulk Indexing and anything related to indexing RFC Issues requesting major changes Roadmap:Stability/Availability/Resiliency Project-wide roadmap label
Projects
Status: New
Development

No branches or pull requests

2 participants