layout | title | sched-activation |
---|---|---|
course |
Week 5, Day 1 (Monday, February 3) |
class="active" |
Walking through the table we used on Friday.
Source: Table 2, p. 78 of [{{ site.data.bibliography.dean2013.title }}]({{ site.data.bibliography.dean2013.url }}), Copyright ACM 2013.
Design options for distributed systems (From [{{ site.data.bibliography.cavage2013.title }}]({{ site.data.bibliography.cavage2013.url }}))
The categories of analysis:
-
Geographies (one global system versus regional "silos")
-
Data segregation (single- versus multi-tenancy)
-
SLA guarantees
- availability
- latency
- throughput
- consistency
- durability
-
IAAA (Identity, Authentication, Authorization, and Audit)
-
Usage tracking
-
Deployment
The standard design questions for any system also apply (versioning, upgrades, ...).
Automating failover (From [{{ site.data.bibliography.cavage2013.title }}]({{ site.data.bibliography.cavage2013.url }}))
Many systems have a "leader" instance that assigns work to the other instances.
- In our design for Assignment 2, there can only be on
server.py
, assigning tasks to theworker.py
instances.
What happens when the "leader" fails? Do you bring up a new leader automatically or have the operations staff do it?
Carried over from Friday.
Read [{{ site.data.bibliography.cavage2013.title }}]({{ site.data.bibliography.cavage2013.url }}), from Platform Components (p. 68) up to and including Platform Usage Collection (p. 69).
Two key points from these sections:
- There are many components of these systems that are not glamorous nor "complicated" but that are necessary for the system.
- How do these components have to be designed to make them scalable?