Skip to content

Commit

Permalink
Update cluster-spec with Valkey 8.0 cluster improvements (#167)
Browse files Browse the repository at this point in the history
Update cluster specification document to clarify shard migration process
changes in Valkey 8.0 and include detailed descriptions of shard
replication and failover mechanism improvements.

Fix #76

---------

Signed-off-by: Ping Xie <[email protected]>
Signed-off-by: Ping Xie <[email protected]>
Co-authored-by: Viktor Söderqvist <[email protected]>
Co-authored-by: Madelyn Olson <[email protected]>
  • Loading branch information
3 people authored Sep 12, 2024
1 parent 25fe317 commit 1c9edd0
Show file tree
Hide file tree
Showing 2 changed files with 52 additions and 1 deletion.
11 changes: 11 additions & 0 deletions commands/cluster-setslot.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,3 +79,14 @@ Notes:
If the source node is informed before the destination node and the destination node crashes before it is set as new slot owner, the slot is left with no owner, even after a successful failover.
* Step 6, sending `SETSLOT` to the nodes not involved in the resharding, is not technically necessary since the configuration will eventually propagate itself.
However, it is a good idea to do so in order to stop nodes from pointing to the wrong node for the hash slot moved as soon as possible, resulting in less redirections to find the right node.
* Starting from Valkey 8.0, `CLUSTER SETSLOT` is synchronously replicated to all healthy replicas
running Valkey version 8.0+. By default, this synchronous replication must complete within 2 seconds.
If the replication fails, the primary does not execute the command, and the client receives a
`NOREPLICAS Not enough good replicas to write` error. Operators can retry the command or customize the
timeout using the `TIMEOUT` parameter to further increase the reliability of live reconfiguration:

```
CLUSTER SETSLOT slot [MIGRATING|IMPORTING|NODE] node-id [TIMEOUT timeout]
```

Here, `timeout` is measured in seconds, with 0 meaning to wait indefinitely.
42 changes: 41 additions & 1 deletion topics/cluster-spec.md
Original file line number Diff line number Diff line change
Expand Up @@ -403,7 +403,7 @@ without redirections, proxies or other single point of failure entities.
A client **must be also able to handle -ASK redirections** that are described
later in this document, otherwise it is not a complete Valkey Cluster client.

### Live reconfiguration
### Live resharding

Valkey Cluster supports the ability to add and remove nodes while the cluster
is running. Adding or removing a node is abstracted into the same
Expand Down Expand Up @@ -513,6 +513,36 @@ set the slots to their normal state again. The same command is usually
sent to all other nodes to avoid waiting for the natural
propagation of the new configuration across the cluster.

#### Replication of `CLUSTER SETSLOT`

Starting from Valkey 8.0, the `CLUSTER SETSLOT` command is replicated if the replicas are running Valkey version 8.0+.
The primary node waits up to 2 seconds, by default, for all healthy replicas to acknowledge the replication.
If not all health replicas acknowledge the replication within this time frame, the primary aborts the command,
and the client receives a `NOREPLICAS Not enough good replicas to write` error.
Operators can retry the command or customize the timeout using the `TIMEOUT` parameter to further increase the
reliability of live resharding:

CLUSTER SETSLOT slot [MIGRATING|IMPORTING|NODE] node-id [TIMEOUT timeout]

The `timeout` is specified in seconds, where a value of 0 indicates an indefinite wait time.

Replicating the slot information and ensuring acknowledgement from health replicas significantly reduces
the likelihood of losing replication states if the primary fails after executing the command.
For example, consider a scenario where the target primary node `B` is finalizing a slot migration.
Before the `SETSLOT` command is replicated to its replica node `B’`, `B` might send a cluster `PONG`
message to the source primary node `A`, promoting `A` to relinquish its ownership of the slot in question.
If `B` crashes right after this point, the replica node `B’`, which could be elected as the new primary,
would not be aware of the slot ownership transfer without the successful replication of `SETSLOT`.
This would leave the slot without an owner, leading to potential data loss and cluster topology inconsistency.

#### Election in empty shards

Starting from Valkey 8.0, Valkey clusters introduce the ability to elect a primary in empty shards.
This behavior ensures that even when a shard is in the process of receiving its first slot,
a primary can be elected. This prevents scenarios where there would be no primary available in the
empty shard to handle redirected requests from the official slot owner,
thereby maintaining availability during the live resharding.

### ASK redirection

In the previous section, we briefly talked about ASK redirection. Why can't
Expand Down Expand Up @@ -550,6 +580,16 @@ Slots migration is explained in similar terms but with different wording
(for the sake of redundancy in the documentation) in the `CLUSTER SETSLOT`
command documentation.

Starting from Valkey 8.0, when the primary in either the source or target shard fails during live resharding,
the primary in the other shard will automatically attempt to update its migrating/importing state to correctly pair
with the newly elected primary. If this update is successful, the ASK redirection will continue functioning without
requiring administrator intervention. In the event that slot migration fails, administrators can manually resume
the interrupted slot migration by running the command `valkey-cli --cluster fix <ip:port>`.

Additionally, since Valkey 8.0, replicas are now able to return `ASK` redirects during slot migrations.
This capability was previously unavailable, as replicas were not aware of ongoing slot migrations in earlier versions.
See the [READONLY](../commands/readonly.md) command.

### Client connections and redirection handling

To be efficient, Valkey Cluster clients maintain a map of the current slot
Expand Down

0 comments on commit 1c9edd0

Please sign in to comment.