layout	title	sched-activation
course	Week 7, Wednesday (February 26, 2014)	class="active"

What makes a cloud application "distributed"?

Nodes run independently
No shared memory or shared clock
Network may reorder or lose messages
Nodes can fail but do not have bugs and cannot be hacked
- Handling buggy or hacked nodes imposes even more overhead (see Byzantine fault tolerance)

Implications

A node only has fast access to its own state
Global time order cannot be observed
- From the system's perspective, there is no global order unless the system enforces one

Crashes differ from network partitions
- A crashed node cannot update
- A partitioned node can update
- A node on the other side of the partition cannot tell if updates are occurring inside it

Nodes achieve consensus if they all agree on some value, such as:

Brewer's CAP "Theorem" (2000): Consistency, availability, partition tolerance---pick any two:

Source: {{site.data.bibliography.takada2013.title}}, Ch. 2. Note: Paxos, Two-phase commit (2PC), and Gossip are algorithms

Not really a theorem

More an argument to expand the design space
Can't choose to have availability without partition tolerance
- If your system stops at a partition, it is unavailable
There are many kinds of consistency
You will make different tradeoffs for different services in a system:
- Fine to be off by one on count of Twitter followers for Beyoncé
- Not so fine to book two weddings in the same room the same time
- Consider compensation methods
Seth Gilbert and Nancy Lynch proved a specific case of CAP Theorem
- But their assumptions do not apply to all systems
- More in their 2012 article

Guide to readings for next class

Read {{site.data.bibliography.takada2013.title}}, from "A system model", up to but not including "Strong consistency vs. other consistency models".

You can skim:

The key points: