-
Notifications
You must be signed in to change notification settings - Fork 393
Handoffs
jrwest edited this page Feb 15, 2013
·
4 revisions
- Handoff starts when riak core decides a vnode is on a node (responsible for data) that it shouldn't be. There are two types of handoff: ownership transfer and hinted handoff. ownership transfer occurs when a vnode is determined to no longer belong to the physical node it is running on (e.g. when a new node joins the cluster) and the vnode needs to be moved. hinted handoff occurs when a "secondary" vnode took the responsibility for a "primary" vnode but the primary vnode is reachable again. In this case, riak core determines that the secondary vnode should handoff data to the primary. These two cases are opaque to a riak core application
- There is a third type of transfer called "repair". This is only worth noting because it complicates the handoff implementation. The rest of these notes ignore repair. More about repair can be found in this commit: https://github.com/basho/riak_core/commit/036e409eb83903315dd43a37c7a93c9256863807
- When riak core decides a vnode should perform handoff it calls
Mod:handoff_starting
andMod:is_empty
. This can occur even if a handoff is already in progress on that vnode - In the case a handoff is already started and a call to start a new one allows a second handoff to start, riak will prevent this https://github.com/basho/riak_core/blob/master/src/riak_core_handoff_manager.erl#L415-427
- When handoff starts, if
is_empty
returns true the handoff will never be added to the handoff manager. instead the handoff is finished (callingMod:handoff_finished
andMod:delete
) immediately. - Therefore, in order to have two handoffs start (and result in the scenario described above where riak core cancels the second), riak core must detect that a vnode must be moved at the same time that a handoff is in progress for that vnode and the vnode must return true of
handoff_starting
and false foris_empty
in response to the second handoff message - If a handoff starts,
handoff_starting
returns true andis_empty
returns false, the vnode "adds an outbound" to theriak_core_handoff_manager
https://github.com/basho/riak_core/blob/master/src/riak_core_vnode.erl#L598-606. - This in turn starts a
riak_core_vnode_sender
if it is determined that handoff can start (no other handoff running , max handoff concurrency not reached) https://github.com/basho/riak_core/blob/master/src/riak_core_handoff_manager.erl#L143-152 & https://github.com/basho/riak_core/blob/master/src/riak_core_handoff_manager.erl#L453-458 - When the sender starts, it sends the
?FOLD_REQ
(a request that represents how to fold over the vnodes data to transfer during handoff) synchronously to the vnode. It seems, in the case of riak_kv for example that the the vnode doesn't necessarily have to respond (as to not block the vnode) but in our case we will not block the vnode and respond so this is somewhat inconsequential. https://github.com/basho/riak_core/blob/master/src/riak_core_handoff_manager.erl#L453-458 - The vnode receives
?FOLD_REQ
which allows it to fold over the data it holds to transfer (see this for more info: https://github.com/rzezeski/try-try-try/tree/master/2011/riak-core-the-vnode#handle_handoff_commandrequest-sender-state---result) - Each iteration of the fold calls
Mod:encode_handoff_item
which allows riak to shuttle the binary representation of the data over to the target vnode https://github.com/basho/riak_core/blob/master/src/riak_core_handoff_sender.erl#L264 - After the synchronous command to the
?FOLD_REQ
request replies, the vnode is sent ahandoff_complete
event by the sender https://github.com/basho/riak_core/blob/master/src/riak_core_handoff_sender.erl#L191 - When the vnode receives the
handoff_complete
event it sends thehandoff_complete
event to theriak_core_vnode_manager
https://github.com/basho/riak_core/blob/master/src/riak_core_vnode.erl#L295-297 - When the
riak_core_vnode_manager
receives thehandoff_complete
event it sends thefinish_handoff
event to the vnode https://github.com/basho/riak_core/blob/master/src/riak_core_vnode_manager.erl#L410-413 - When the vnode receives
finish_handoff
it callsMod:handoff_finished
https://github.com/basho/riak_core/blob/master/src/riak_core_vnode.erl#L415-424 - to complete the handoff after receiving
finish_handoff
the vnode callsMod:delete
and then unregisters itself so that it can no longer receive commands https://github.com/basho/riak_core/blob/master/src/riak_core_vnode.erl#L391 - in the case the vnode has already been deleted when it receives
finish_handoff
its basically a no-op https://github.com/basho/riak_core/blob/master/src/riak_core_vnode.erl#L411-414 -
Handoff Error:
- the vnode handles the error in
handoff_cancelled
. it is not made aware of what the error was . - how handoff error propagates:
- the
riak_core_handoff_sender
catches an error https://github.com/basho/riak_core/blob/master/src/riak_core_handoff_sender.erl#L200-218 - in some cases, it sends the vnode a
handoff_error
event, which will subsequently be forwarded on to theriak_core_vnode_manager
https://github.com/basho/riak_core/blob/master/src/riak_core_vnode.erl#L299 & https://github.com/basho/riak_core/blob/master/src/riak_core_vnode.erl#L652 - in the other cases, the sender dies with a non-
normal
reason. the sender is monitored by theriak_core_handoff_manager
which in turn will take the reason the sender died and pass it on to the vnode https://github.com/basho/riak_core/blob/master/src/riak_core_handoff_manager.erl#L255-282 the vnode then in turn takes that error and passes it on to theriak_core_vnode_manager
like the other case
- the
- once the
riak_core_vnode_manager
receives thehandoff_error
event it sends thecancel_handoff
event to the vnode: https://github.com/basho/riak_core/blob/master/src/riak_core_vnode_manager.erl#L414-417 - when the vnode receives the
cancel_handoff
event it callsMod:handoff_cancelled
and then returns to a non-handoff state https://github.com/basho/riak_core/blob/master/src/riak_core_vnode.erl#L426-437
- the vnode handles the error in
- basically everything is operates normally for this vnode as if nothing is going on
- the only difference is periodically
Mod:handle_handoff_data
is called with data being handed of from the source vnode. it is expected that the target vnode will do something with this data - there are a bunch of mechanics that make this go but they don't seem worth noting here right now. For more info start with
riak_core_handoff_receiver
.