[RFC] Reindex-from-Snapshot (RFS) #1

chelma · 2024-03-08T15:33:33Z

What/Why

In this RFC we propose to develop new, standardized tooling that will enable OpenSearch users to more easily move data between platforms and versions. The phrase we’ve been using to describe this approach so far is Reindex-from-Snapshot (RFS). At a high level, the idea is to take a snapshot of the source cluster, pull the source docs directly from the snapshot, and re-ingest them on the target using its REST API. This would add an additional option to the mix of traditional snapshot/restore and using the reindex REST API on the source cluster. It is hoped that this approach will combine many of the best aspects of the two existing methods of data movement while avoiding some of their drawbacks.

What users have asked for this feature?

Many users need to maintain a full copy of historical data in their ES/OS cluster (e.g. the “Search” use-case). For these users, moving to a new cluster entails moving that data, ideally with as little effort and as little impact on the source cluster as possible. Improving the experience for this process will enable users to upgrade their cluster versions more easily and move to new platforms more easily.

What problem are you trying to solve?

When moving historical data from a source cluster to a target cluster, there are two broad aspects that need to be considered: (1) moving the data to the target efficiently and (2) ensuring the moved data is usable on the target. (2) is especially important when the target cluster’s ES/OS major version does not match the source cluster. The two existing data movement solutions both have tradeoffs with regards to these aspects, which we will briefly explore below.

The goal of this proposal is to have a data movement solution that is both efficient and able to ensure the data is usable on the target, even if the target is beyond the Lucene backwards compatibility limit.

Existing Solution: Snapshot/Restore

Snapshot/Restore makes a copy of the source cluster at the filesystem level and packs the copy into format that can be more easily stored where the user desires (on disk, in network storage, in the cloud, etc). When the user wants to restore the copy, the target cluster has its nodes retrieve the portions of the snapshot relevant to them, unpack them locally, and load them via Lucene. The data in the snapshot is not re-ingested by ES/OS.

Snapshot/Restore is better at moving the data efficiently. It reduces strain on the source cluster by avoiding the cost of pulling all historical data through the REST API and enables the target cluster to stand up without having to re-ingest the historical data through its REST API. As a side benefit, it also enables users to rehydrate a cluster from a backup in cold storage. However, it is not an option if the major version of the target is no more than a single increment from the source (a limitation driven by Lucene backwards compatibility, see [1]). Additionally, once data has been moved to a newer major version using Snapshot/Restore (e.g. from v1.X to v2.X), it must be reindexed before it can be moved an even newer major version (e.g. from v2.X to v3.X); this is also due to Lucene backwards compatibility.

Existing Solution: Reindex REST API

The Reindex REST API allows users to set up an operation on the source cluster that will cause it to send all documents in the specified indices to a target cluster for re-ingestion. Some versions of ES/OS support the option of parallelizing this process within a given index using sliced scrolls [2]. This process operates at the application layer on both the source and target clusters.

The Reindex REST API on the source cluster is useful when the user needs to move to a target cluster beyond the Lucene backwards compatibility limit, or when snapshot/restore otherwise isn’t an option. Re-ingesting the source documents on the target using the reindex API bypasses the backwards compatibility issue by creating new Lucene indices on the target instead of trying to read the source Lucene indices in the target cluster. However, the faster you perform the data movement (such as by using sliced scrolls), the greater the impact on the ability of the source cluster to serve production traffic. Additionally, having to operate at the application layer means that the overhead of the distributed system comes into play rather than being able to transfer data at the filesystem level. Finally, the Reindex REST API is only usable for an index if the source cluster has retained an original copy of every document in that index via the _source feature [3].

[1] https://opensearch.org/docs/2.12/install-and-configure/upgrade-opensearch/index/#compatibility
[2] https://www.elastic.co/guide/en/elasticsearch/reference/7.10/docs-reindex.html#docs-reindex-slice
[3] https://www.elastic.co/guide/en/elasticsearch/reference/7.10/docs-reindex.html

How would the proposal solve that problem?

The core idea of RFS is to write a set of tooling to perform the following workflow:

Take a snapshot of the source cluster, using the standard API/process.
Run the tool. It takes in the location of the snapshot (local disk, S3, etc) and the ES/OS Indices to be restored, then parses the snapshot to find the relevant metadata files and snapshot blob-files in the snapshot.
Then for each shard of each ES/OS Index to be restored, the tool would:
- Copy the snapshot blob-files to disk (if they are not already local)
- Unpack them to their original Lucene file format
- Use Lucene to read the shard’s files as a Lucene Index and pull the original source documents from the _source field [1]
- PUT the source documents to the target cluster, allowing the target to re-ingest the document and create new Lucene Indices. The target cluster can be scaled up to facilitate re-ingestions, then scaled down to meet expected production demand.

This approach appears to combine the benefits of both snapshot/restore and using the reindex API on the source cluster. Most importantly, it removes the strain on the source cluster during data movement while also bypassing the Lucene backwards compatibility limit by re-ingesting the data on the target cluster. Additionally, it allows users to rehydrate a historical cluster’s dataset from a snapshot, potentially reducing the cost of a data movement by removing the need to have the source cluster running. Finally, it opens up the possibility of skipping the ES/OS snapshot entirely and operating directly from disk images (such as an EBS Volume snapshot), which some users have already had success with (see [2]).

[1] https://www.elastic.co/guide/en/elasticsearch/reference/7.10/mapping-source-field.html
[2] https://blogs.oracle.com/cloud-infrastructure/post/behind-the-scenes-opensearch-with-oci-vertical-scaling

What could the user experience be like?

The user experience could be as follows:

The user creates a new target cluster of the desired ES/OS version on the desired platform. They would likely want to initially set the replica number to 0 to speed up ingestion by avoiding the need to duplicate to the data nodes from each shard’s master. They may also wish to temporarily spin up a set of dedicated ingestion nodes [1] to further speed up writes.
The user registers a new, empty snapshot repository on their source cluster. Making a new repository simplifies the process of parsing a snapshot’s details from the repository and increases the speed of creating the snapshot, as there is no need to consider or perform incremental snapshots. This repository would be in a network-accessible location so that the RFS tooling could easily access its contents.
The user makes a snapshot of the source cluster into the new, empty repository.
The user passes the RFS tooling the location/connection details/access permissions for both the snapshot and the target cluster, and tells it to “go”.
RFS will perform the steps required to parse the snapshot, retrieve the portions relevant at a given time, extract the _source documents from retrieved snapshot portions, and reindex them against the target cluster (as explained in more detail above).
The user reconfigures the target cluster to their “production” settings (probably add replicas, probably remove the additional ingest nodes)

[1] https://www.elastic.co/guide/en/elasticsearch/reference/7.10/ingest.html

What are the limitations/drawbacks of the proposal?

Community feedback is greatly welcomed on this point, but the following occur to the author:

_source Flag: The approach relies on having the original documents stored in the Lucene index via the _source field. If the user has disabled this feature on their index, it will not be possible to extract the original documents from the snapshot using Lucene, and therefore not be possible to re-play them against the target cluster. Without the original document, it becomes harder (impossible?) to move the data across multiple major versions at full fidelity. However, this is not a problem unique to the proposed solution; using the Reindex REST API has the same limitation.
Version Compatibility: The tooling may need to support a path from each version to each other version to reach its full potential. This could be initially simplified to just supporting one minor version of each major version as a source/target, and having users first upgrade to that minor version on their source. It is currently unclear whether this simplification would be sufficient long term, or how much work would be involved if that simplification is not sufficient. As a data point, the author was able to add support for ES 6.8 to his simplified prototype scripts about a day after first having support for OS 7.10.
Plugin Compatibility: Parsing snapshots requires custom classes to understand the metadata files and (potentially) the blobbed Lucene files in the snapshot. These classes are loaded into the Elasticsearch/OpenSearch process when the node stands up (see here [1]). We may need access to to the user’s plugins in order to parse their snapshots.

[1] https://github.com/elastic/elasticsearch/blob/6.8/server/src/main/java/org/elasticsearch/node/Node.java#L425

Have you tested this approach?

Yes. The author has some proof-of-concept scripts working that will take an Elasticsearch 6.8.23 snapshot [1] and move its contents to an OpenSearch 2.11 target cluster. The also has another version of the scripts [2] that unpacks an Elasticsearch 7.10.2 snapshot and performs replay against an Elasticsearch 7.10.2 target. As a PoC, there is obviously more development to be done to make them production-ready.

[1] https://github.com/chelma/reindex-from-snapshot/tree/6.8
[2] https://github.com/chelma/reindex-from-snapshot/tree/7.10

chelma · 2024-03-14T16:09:44Z

Posted in the OpenSearch repo - opensearch-project/OpenSearch#12667

chelma changed the title ~~[RFC] [DRAFT] Reindex-from-Snapshot (RFS)~~ [RFC] Reindex-from-Snapshot (RFS) Mar 13, 2024

chelma closed this as completed Mar 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Reindex-from-Snapshot (RFS) #1

[RFC] Reindex-from-Snapshot (RFS) #1

chelma commented Mar 8, 2024 •

edited

Loading

chelma commented Mar 14, 2024

[RFC] Reindex-from-Snapshot (RFS) #1

[RFC] Reindex-from-Snapshot (RFS) #1

Comments

chelma commented Mar 8, 2024 • edited Loading

What/Why

What users have asked for this feature?

What problem are you trying to solve?

Existing Solution: Snapshot/Restore

Existing Solution: Reindex REST API

How would the proposal solve that problem?

What could the user experience be like?

What are the limitations/drawbacks of the proposal?

Have you tested this approach?

chelma commented Mar 14, 2024

chelma commented Mar 8, 2024 •

edited

Loading