Elixir library to run multiple Raft consensus groups in a cluster of ErlangVMs
- Easy hosting of multiple "cluster-wide state"s
- Flexible data model (defined by rafted_value)
- Decentralized architecture and fault tolerance
- Reasonably scalable placement of processes for multiple Raft consensus groups
- Consensus member processes are distributed to ErlangVMs in a data center-aware manner using rendezvous hashing
- Automatic rebalancing on adding/removing nodes
- Location transparency
- Each consensus group leader is accessible using name (an atom) of the consensus group
- Actual pids of consensus leader processes are cached in a local ETS table for fast access
- Users of
<= 0.6.0
should upgrade to0.6.1
before upgrading to0.7.x
due to a change in internal data structure. While<= 0.6.0
and0.7.x
are not compatible,0.6.1
should be able to interact with both<= 0.6.0
and0.7.x
.
Suppose we have a cluster of 4 erlang nodes:
$ iex --sname 1 -S mix
iex(1@skirino-Manjaro)>
$ iex --sname 2 -S mix
iex(2@skirino-Manjaro)> Node.connect(:"1@skirino-Manjaro")
$ iex --sname 3 -S mix
iex(3@skirino-Manjaro)> Node.connect(:"1@skirino-Manjaro")
$ iex --sname 4 -S mix
iex(4@skirino-Manjaro)> Node.connect(:"1@skirino-Manjaro")
Load the following module that implements RaftedValue.Data
behaviour on all nodes in the cluster.
defmodule JustAnInt do
@behaviour RaftedValue.Data
def new(), do: 0
def command(i, {:set, j}), do: {i, j }
def command(i, :inc ), do: {i, i + 1}
def query(i, :get), do: i
end
Call RaftFleet.activate/1
on all nodes.
iex(1@skirino-Manjaro)> RaftFleet.activate("zone1")
iex(2@skirino-Manjaro)> RaftFleet.activate("zone2")
iex(3@skirino-Manjaro)> RaftFleet.activate("zone1")
iex(4@skirino-Manjaro)> RaftFleet.activate("zone2")
Create 5 consensus groups each of which replicates an integer and has 3 consensus members.
iex(1@skirino-Manjaro)> rv_config = RaftedValue.make_config(JustAnInt)
iex(1@skirino-Manjaro)> RaftFleet.add_consensus_group(:consensus1, 3, rv_config)
iex(1@skirino-Manjaro)> RaftFleet.add_consensus_group(:consensus2, 3, rv_config)
iex(1@skirino-Manjaro)> RaftFleet.add_consensus_group(:consensus3, 3, rv_config)
iex(1@skirino-Manjaro)> RaftFleet.add_consensus_group(:consensus4, 3, rv_config)
iex(1@skirino-Manjaro)> RaftFleet.add_consensus_group(:consensus5, 3, rv_config)
Now we can run query/command from any node in the cluster:
iex(1@skirino-Manjaro)> RaftFleet.query(:consensus1, :get)
{:ok, 0}
iex(2@skirino-Manjaro)> RaftFleet.command(:consensus1, :inc)
{:ok, 0}
iex(3@skirino-Manjaro)> RaftFleet.query(:consensus1, :get)
{:ok, 1}
Activating/deactivating a node in the cluster triggers rebalancing of consensus member processes.
To run raft_fleet
within an ErlangVM cluster, the followings are our general recommendations.
-
Cluster should consist of at least 3 nodes to tolerate 1 node failure. Similarly cluster nodes should span 3 (or more) data centers, so that the system keeps on functioning in the face of 1 data center failure.
-
When you add new ErlangVM nodes, each node should run the following initialization steps:
- establish connections to other running nodes,
- call
RaftFleet.activate/1
.
These steps are typically done within
start/2
of the main OTP application. Information of other running nodes should be available from e.g. IaaS API. -
When terminating a node you should proceed as follows (although
raft_fleet
tolerates failures as long as quorums are maintained, it's much better to tellraft_fleet
to make preparations beforehand):- call
RaftFleet.deactivate/0
within the node-to-be-terminated, - wait for a while (say, 10 min) so that existing consensus group members are migrated to the other nodes, then
- finally shutdown the node.
- call
- Raft official website
- The original paper and the thesis about the Raft protocol
rafted_value
: Elixir implementation of the Raft consensus protocol- skirino's slides to introduce rafted_value and raft_fleet