Skip to content

Commit

Permalink
deploy: 7c877f6
Browse files Browse the repository at this point in the history
  • Loading branch information
Arpita-Jaiswal committed Mar 3, 2024
1 parent a663fa7 commit 2ee6a1e
Show file tree
Hide file tree
Showing 29 changed files with 22,236 additions and 13 deletions.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
3 changes: 3 additions & 0 deletions FASTN.ftd
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,9 @@ nav-title: My Day to Day Learnings and Discoveries
document: ddd/data-intensive-application/3-storage-retrieval.ftd
- - Encoding and Evolution: /ddd/dia-4/
document: ddd/data-intensive-application/4-encoding-and-evolution.ftd
- Part II. Distributed Data:
- - Replication: /ddd/dia-5/
document: ddd/data-intensive-application/5-replication.ftd

## FifthTry: /fifthtry/

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
<script src="prism-8FE3A42AD547E2DBBC9FE16BA059F8F1870D613AD0C5B0B013D6205B72A62BAF.js"></script>
<script src="default-9987017245102B708B9BEC5DEE998BBF40BEAAD8E43363EA774F86B38F61E19B.js"></script>
<link rel="stylesheet" href="prism-73F718B9234C00C5C14AB6A11BF239A103F0B0F93B69CD55CB5C6530501182EB.css">
<script src="-/fastn-stack.github.io/fastn-js/download.js"></script><script src="-/arjaupashak.com/search.js"></script>
<script src="-/arjaupashak.com/search.js"></script><script src="-/fastn-stack.github.io/fastn-js/download.js"></script>



Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
<script src="prism-8FE3A42AD547E2DBBC9FE16BA059F8F1870D613AD0C5B0B013D6205B72A62BAF.js"></script>
<script src="default-9987017245102B708B9BEC5DEE998BBF40BEAAD8E43363EA774F86B38F61E19B.js"></script>
<link rel="stylesheet" href="prism-73F718B9234C00C5C14AB6A11BF239A103F0B0F93B69CD55CB5C6530501182EB.css">
<script src="-/arjaupashak.com/search.js"></script><script src="-/fastn-stack.github.io/fastn-js/download.js"></script>
<script src="-/fastn-stack.github.io/fastn-js/download.js"></script><script src="-/arjaupashak.com/search.js"></script>



Expand Down
128 changes: 128 additions & 0 deletions ddd/data-intensive-application/5-replication.ftd
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
-- ds.page: Replication

**Note**: The content is taken from [`Designing Data Intensive Application by
Martin Kleppmann`](https://public.nikhil.io/Designing%20Data%20Intensive%20Applications.pdf)

In this chapter we will assume that your dataset is so small that each machine can
hold a copy of the entire dataset. In later chapter we will relax that assumption and
discuss partitioning (sharding) of datasets that are too big for a single machine.

-- ds.h1: Leaders and Followers

- One of the replicas is designated the leader (also known as master or primary).
When clients want to write to the database, they must send their requests to the
leader, which first writes the new data to its local storage.
- The other replicas are known as followers (read replicas, slaves, secondaries, or hot
standbys). Whenever the leader writes new data to its local storage, it also sends
the data change to all of its followers as part of a replication log or change stream.
- When a client wants to read from the database, it can query either the leader or
any of the followers. However, writes are only accepted on the leader (the follow‐
ers are read-only from the client’s point of view).

-- ds.image:
src: $assets.files.ddd.data-intensive-application.images.5-1.png

-- ds.markdown:

This mode of replication is a built-in feature of many relational databases,
such as PostgreSQL (since version 9.0), MySQL, Oracle Data Guard, and SQL Server’s
AlwaysOn Availability Groups. It is also used in some nonrelational databases,
including MongoDB, RethinkDB, and Espresso. Finally, leader-based replication is
not restricted to only databases: distributed message brokers such as Kafka and
RabbitMQ highly available queues also use it. Some network filesystems and
replicated block devices such as DRBD are similar.

-- ds.h2: Synchronous Versus Asynchronous Replication

-- ds.image:
src: $assets.files.ddd.data-intensive-application.images.5-2.png

-- ds.markdown:

The replication to follower 1 is synchronous: the leader waits until follower 1
has confirmed that it received the write before reporting success to the user.
The replication to follower 2 is asynchronous: the leader sends the message, but
doesn’t wait for a response from the follower.

The **advantage** of synchronous replication is that the follower is guaranteed
to have an up-to-date copy of the data that is consistent with the leader. If
the leader suddenly fails, we can be sure that the data is still available on
the follower. The **disadvantage** is that if the synchronous follower doesn’t
respond (because it has crashed, or there is a network fault, or for any other
reason), the write cannot be processed. The leader must block all writes and wait
until the synchronous replica is available again.

For that reason, it is impractical for all followers to be synchronous. In
practice, if you enable synchronous replication on a database, it usually means
that one of the followers is synchronous, and the others are asynchronous. If
the synchronous follower becomes unavailable or slow, one of the asynchronous
followers is made synchronous. This guarantees that you have an up-to-date copy
of the data on at least two nodes. This configuration is sometimes also called
semi-synchronous.

Often, leader-based replication is configured to be completely asynchronous. In
this case, if the leader fails and is not recoverable. This means that a write
is not guaranteed to be durable, even if it has been confirmed to the client.

Weakening durability may sound like a bad trade-off, but asynchronous
replication is nevertheless widely used, especially if there are many followers
or if they are geo‐graphically distributed.

-- ds.h2: Setting Up New Followers

Simply copying data files from one node to another is typically not sufficient:
clients are constantly writing to the database, and the data is always in flux,
so a standard file copy would see different parts of the database at different
points in time.

You could make the files on disk consistent by locking the database (making it
unavailable for writes), but that would go against our goal of high availability.
Fortunately, setting up a follower can usually be done without downtime.
Conceptually, the process looks like this:

- Take a consistent snapshot of the leader’s database at some point in time—if
possible, without taking a lock on the entire database.
- Copy the snapshot to the new follower node.
- The follower connects to the leader and requests all the data changes that
have happened since the snapshot was taken. This requires that the snapshot
is associated with an exact position in the leader’s replication log. That
position has various names: for example, PostgreSQL calls it the *log sequence
number*, and MySQL calls it the *binlog coordinates*.
- When the follower has processed the backlog of data changes since the
snapshot, we say it has *caught up*.

The practical steps of setting up a follower vary significantly by database. In
some systems the process is fully automated, whereas in others it can be a
somewhat arcane multi-step workflow that needs to be manually performed by an
administrator.


-- ds.h2: Handling Node Outages

Any node in the system can go down, perhaps unexpectedly due to a fault, but
just as likely due to planned maintenance (for example, rebooting a machine to
install a kernel security patch).

How do you achieve high availability with leader-based replication?

-- ds.h3: Follower failure: Catch-up recovery

On its local disk, each follower keeps a log of the data changes it has received
from the leader. If a follower crashes and is restarted, or if the network
between the leader and the follower is temporarily interrupted, the follower
can recover quite easily: from its log, it knows the last transaction that was
processed before the fault occurred. Thus, the follower can connect to the
leader and request all the data changes that occurred during the time when the
follower was disconnected. When it has applied these changes, it has caught up
to the leader and can continue receiving a stream of data changes as before.


-- ds.h3: Leader failure: Failover

Handling a failure of the leader is trickier: one of the followers needs to be
promoted to be the new leader, clients need to be reconfigured to send their
writes to the new leader, and the other followers need to start consuming data
changes from the new leader. This process is called **failover**.


-- end: ds.page
Loading

0 comments on commit 2ee6a1e

Please sign in to comment.