-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
a663fa7
commit 2ee6a1e
Showing
29 changed files
with
22,236 additions
and
13 deletions.
There are no files selected for viewing
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,128 @@ | ||
-- ds.page: Replication | ||
|
||
**Note**: The content is taken from [`Designing Data Intensive Application by | ||
Martin Kleppmann`](https://public.nikhil.io/Designing%20Data%20Intensive%20Applications.pdf) | ||
|
||
In this chapter we will assume that your dataset is so small that each machine can | ||
hold a copy of the entire dataset. In later chapter we will relax that assumption and | ||
discuss partitioning (sharding) of datasets that are too big for a single machine. | ||
|
||
-- ds.h1: Leaders and Followers | ||
|
||
- One of the replicas is designated the leader (also known as master or primary). | ||
When clients want to write to the database, they must send their requests to the | ||
leader, which first writes the new data to its local storage. | ||
- The other replicas are known as followers (read replicas, slaves, secondaries, or hot | ||
standbys). Whenever the leader writes new data to its local storage, it also sends | ||
the data change to all of its followers as part of a replication log or change stream. | ||
- When a client wants to read from the database, it can query either the leader or | ||
any of the followers. However, writes are only accepted on the leader (the follow‐ | ||
ers are read-only from the client’s point of view). | ||
|
||
-- ds.image: | ||
src: $assets.files.ddd.data-intensive-application.images.5-1.png | ||
|
||
-- ds.markdown: | ||
|
||
This mode of replication is a built-in feature of many relational databases, | ||
such as PostgreSQL (since version 9.0), MySQL, Oracle Data Guard, and SQL Server’s | ||
AlwaysOn Availability Groups. It is also used in some nonrelational databases, | ||
including MongoDB, RethinkDB, and Espresso. Finally, leader-based replication is | ||
not restricted to only databases: distributed message brokers such as Kafka and | ||
RabbitMQ highly available queues also use it. Some network filesystems and | ||
replicated block devices such as DRBD are similar. | ||
|
||
-- ds.h2: Synchronous Versus Asynchronous Replication | ||
|
||
-- ds.image: | ||
src: $assets.files.ddd.data-intensive-application.images.5-2.png | ||
|
||
-- ds.markdown: | ||
|
||
The replication to follower 1 is synchronous: the leader waits until follower 1 | ||
has confirmed that it received the write before reporting success to the user. | ||
The replication to follower 2 is asynchronous: the leader sends the message, but | ||
doesn’t wait for a response from the follower. | ||
|
||
The **advantage** of synchronous replication is that the follower is guaranteed | ||
to have an up-to-date copy of the data that is consistent with the leader. If | ||
the leader suddenly fails, we can be sure that the data is still available on | ||
the follower. The **disadvantage** is that if the synchronous follower doesn’t | ||
respond (because it has crashed, or there is a network fault, or for any other | ||
reason), the write cannot be processed. The leader must block all writes and wait | ||
until the synchronous replica is available again. | ||
|
||
For that reason, it is impractical for all followers to be synchronous. In | ||
practice, if you enable synchronous replication on a database, it usually means | ||
that one of the followers is synchronous, and the others are asynchronous. If | ||
the synchronous follower becomes unavailable or slow, one of the asynchronous | ||
followers is made synchronous. This guarantees that you have an up-to-date copy | ||
of the data on at least two nodes. This configuration is sometimes also called | ||
semi-synchronous. | ||
|
||
Often, leader-based replication is configured to be completely asynchronous. In | ||
this case, if the leader fails and is not recoverable. This means that a write | ||
is not guaranteed to be durable, even if it has been confirmed to the client. | ||
|
||
Weakening durability may sound like a bad trade-off, but asynchronous | ||
replication is nevertheless widely used, especially if there are many followers | ||
or if they are geo‐graphically distributed. | ||
|
||
-- ds.h2: Setting Up New Followers | ||
|
||
Simply copying data files from one node to another is typically not sufficient: | ||
clients are constantly writing to the database, and the data is always in flux, | ||
so a standard file copy would see different parts of the database at different | ||
points in time. | ||
|
||
You could make the files on disk consistent by locking the database (making it | ||
unavailable for writes), but that would go against our goal of high availability. | ||
Fortunately, setting up a follower can usually be done without downtime. | ||
Conceptually, the process looks like this: | ||
|
||
- Take a consistent snapshot of the leader’s database at some point in time—if | ||
possible, without taking a lock on the entire database. | ||
- Copy the snapshot to the new follower node. | ||
- The follower connects to the leader and requests all the data changes that | ||
have happened since the snapshot was taken. This requires that the snapshot | ||
is associated with an exact position in the leader’s replication log. That | ||
position has various names: for example, PostgreSQL calls it the *log sequence | ||
number*, and MySQL calls it the *binlog coordinates*. | ||
- When the follower has processed the backlog of data changes since the | ||
snapshot, we say it has *caught up*. | ||
|
||
The practical steps of setting up a follower vary significantly by database. In | ||
some systems the process is fully automated, whereas in others it can be a | ||
somewhat arcane multi-step workflow that needs to be manually performed by an | ||
administrator. | ||
|
||
|
||
-- ds.h2: Handling Node Outages | ||
|
||
Any node in the system can go down, perhaps unexpectedly due to a fault, but | ||
just as likely due to planned maintenance (for example, rebooting a machine to | ||
install a kernel security patch). | ||
|
||
How do you achieve high availability with leader-based replication? | ||
|
||
-- ds.h3: Follower failure: Catch-up recovery | ||
|
||
On its local disk, each follower keeps a log of the data changes it has received | ||
from the leader. If a follower crashes and is restarted, or if the network | ||
between the leader and the follower is temporarily interrupted, the follower | ||
can recover quite easily: from its log, it knows the last transaction that was | ||
processed before the fault occurred. Thus, the follower can connect to the | ||
leader and request all the data changes that occurred during the time when the | ||
follower was disconnected. When it has applied these changes, it has caught up | ||
to the leader and can continue receiving a stream of data changes as before. | ||
|
||
|
||
-- ds.h3: Leader failure: Failover | ||
|
||
Handling a failure of the leader is trickier: one of the followers needs to be | ||
promoted to be the new leader, clients need to be reconfigured to send their | ||
writes to the new leader, and the other followers need to start consuming data | ||
changes from the new leader. This process is called **failover**. | ||
|
||
|
||
-- end: ds.page |
Oops, something went wrong.