[Remote Store] Design - Dual Mode Replication during Remote Store migration #12413

shourya035 · 2024-02-21T06:18:07Z

Introduction:

In order to support migration to RemoteStore backed nodes, we would be moving over shards from DocRep backed nodes to the RemoteStore backed and SegRep enabled ones. The migration would be done as:

Primaries being moved to RemoteStore based nodes first
Replicas would follow after the primary relocation is completed
Applying remote store based index settings

More details on the migration process is here : #12246

During this phase, there would be a time wherein certain shard copies in a replication group resides in a DocRep engine based node while the primary has moved over to RemoteStore enabled ones. We would need to support a mixed replication mode to cater to the tenet that there would be no impact to the index and search traffic during the migration process.

Tenets:

No impact to search and indexing functionality
Data consistency should be intact
Snapshot creation/restore process would continue as is

Handling dual mode replication on _shrink and _split API invocation during the migration process would be handled separately and will not be a part of this enhancement story.

Proposed solution:

Today, we depend on the index metadata to determine if an index is Remote/Segrep enabled or Docrep enabled. Since index metadata update will take place after all shard copies have moved over to the remote enabled nodes, the source of truth will be moved over to node attributes instead of index metadata.

With the new MIXED compatibility mode introduced through #11986 , node attributes would be considered for determining the replication mode and remote upload/download enablement when compatibility mode is set to MIXED and the migration direction is set.

To ensure data consistency on failovers during this migration process, Peer Recovery Retention Lease (PRRL) publication would be kept unblocked during this time. This is done to ensure that we do not lose out on any sequence number based recovery when a DocRep enabled replica shard copy in the replication group is promoted to a primary. Checks would be introduced to ensure that there are no missing sequence numbers during this failover process.

The following diagram explains the flow for a write request in this stage:

The entire dual mode replication change set would be divided in the following 4 charters:

Handle replication action changes on primary in the write path
Handle replication action changes on replica in the write path
Handle GlobalCheckpointSyncAction and PublishCheckpointAction replication actions
Handle PRRLs during the migration process

The text was updated successfully, but these errors were encountered:

peternied · 2024-02-21T16:12:20Z

[Triage - attendees 1 2 3 4 5]
@shourya035 Thanks for creating this issue, look forward to see where this lands

shourya035 added enhancement Enhancement or improvement to existing feature or request untriaged labels Feb 21, 2024

github-actions bot added the Storage:Remote label Feb 21, 2024

github-project-automation bot added this to Storage Project Board Feb 21, 2024

github-project-automation bot moved this to 🆕 New in Storage Project Board Feb 21, 2024

linuxpi assigned shourya035 Feb 21, 2024

shourya035 changed the title ~~[Remote Store] Dual Mode Replication during Remote Store migration~~ [Remote Store] Design - Dual Mode Replication during Remote Store migration Feb 21, 2024

peternied added RFC Issues requesting major changes Storage:Durability Issues and PRs related to the durability framework and removed untriaged labels Feb 21, 2024

shourya035 mentioned this issue Mar 21, 2024

[Remote Store] Primary/Replica side changes to support Dual Replication #12821

Merged

8 tasks

gbbafna closed this as completed in #12821 Apr 2, 2024

github-project-automation bot moved this from 🆕 New to ✅ Done in Storage Project Board Apr 2, 2024

shourya035 mentioned this issue Apr 17, 2024

[Remote Store] Update index with Remote Store based settings once all shard copies have moved over to remote store enabled nodes #13252

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Remote Store] Design - Dual Mode Replication during Remote Store migration #12413

[Remote Store] Design - Dual Mode Replication during Remote Store migration #12413

shourya035 commented Feb 21, 2024 •

edited

Loading

peternied commented Feb 21, 2024

[Remote Store] Design - Dual Mode Replication during Remote Store migration #12413

[Remote Store] Design - Dual Mode Replication during Remote Store migration #12413

Comments

shourya035 commented Feb 21, 2024 • edited Loading

Introduction:

Tenets:

Proposed solution:

peternied commented Feb 21, 2024

shourya035 commented Feb 21, 2024 •

edited

Loading