Skip to content

Latest commit

 

History

History
54 lines (34 loc) · 2 KB

Week5-Mon.md

File metadata and controls

54 lines (34 loc) · 2 KB
layout title sched-activation
course
Week 5, Day 1 (Monday, February 3)
class="active"

Discussion of answers to Friday's in-class exercise

Walking through the table we used on Friday.

Read Latencies observed in a BigTable service benchmark

Source: Table 2, p. 78 of [{{ site.data.bibliography.dean2013.title }}]({{ site.data.bibliography.dean2013.url }}), Copyright ACM 2013.

Design options for distributed systems (From [{{ site.data.bibliography.cavage2013.title }}]({{ site.data.bibliography.cavage2013.url }}))

The categories of analysis:

  • Geographies (one global system versus regional "silos")

  • Data segregation (single- versus multi-tenancy)

  • SLA guarantees

    • availability
    • latency
    • throughput
    • consistency
    • durability
  • IAAA (Identity, Authentication, Authorization, and Audit)

  • Usage tracking

  • Deployment

The standard design questions for any system also apply (versioning, upgrades, ...).

Automating failover (From [{{ site.data.bibliography.cavage2013.title }}]({{ site.data.bibliography.cavage2013.url }}))

Many systems have a "leader" instance that assigns work to the other instances.

  • In our design for Assignment 2, there can only be on server.py, assigning tasks to the worker.py instances.

What happens when the "leader" fails? Do you bring up a new leader automatically or have the operations staff do it?

Guide to readings for next class

Carried over from Friday.

Read [{{ site.data.bibliography.cavage2013.title }}]({{ site.data.bibliography.cavage2013.url }}), from Platform Components (p. 68) up to and including Platform Usage Collection (p. 69).

Two key points from these sections:

  1. There are many components of these systems that are not glamorous nor "complicated" but that are necessary for the system.
  2. How do these components have to be designed to make them scalable?