Skip to content
alq666 edited this page Mar 14, 2012 · 8 revisions

Cassandra requires you to place replicas in different racks to achieve the fault tolerance in case there is a rack outage. To achieve a even distribution of data you might need to alternate the nodes in the cluster into different racks. To understand how cassandra decides the replicas to replicate you might want to read NetworkTopologyStrategy

Priam is a peer-to-peer co-process which tries not to hold any state in itself. To store the membership information Priam uses a external system, an external system can be simpledb or even another cassandra instance. When Priam comes for the first time it will try to achieve a lock on a token and once it achieves a lock it will set the token in cassandra.yaml and starts cassandra in bootstrapping mode. When a node goes down and a replacing node comes online it will replace the nodes token by setting -Dcassandra.replace_token=. By default we provide support for SimpleDB for storing the metadata and AWS api's to query the membership.

Logic: To achieve even distribution, Priam calculates the number of possible nodes in the cluster and splits the ring into equal pieces. Once the number of splits are calculated Priam trys to get a lock on the token based on its location. Priam avoids placing the nodes in the same rack next to each other.

AWS: We need to create one Auto scaler per zone, in AWS the autoscaler will try to spin a instance in a zone and if it is not able to create the instance it will try to create in the other zone. Once the other zone is available it will try to rebalance the zones by randomly terminating the instance in zones. Hence the work around is to create one autoscaler per zone. Note: Priam requires atleast 2 zone in the current implementation to create a comprehensive seed list.

Interface IMembership: getRacMembership list all the instance ID in a rack where we are querying from. getRacMembershipSize number of instances in this rack getRacCount total amount of racks in the cluster addACL ACL changes needed to achive communication between the nodes.

Clone this wiki locally