We all know how to set up standard Primary Replica replication, but I have been asked by many about Multi-Source replication and how to set it up. This leads to the discussion of Circular replication, which is what we are going to discuss this in this blog.
As a follow-up to this blog, I will be discussing Multi-source replication, which will be published sometime in the future.
This will be a short blog, assuming that we have two servers running MariaDB 10.5 or higher. The firewall between these two nodes has been opened to allow traffic on the default MariaDB port, 3306
. The document should also work with older versions of MariaDB as well.
Note: Important thing to note here is that the replication technology discussed here is only applicable to MariaDB standard storage engines like InnoDB, RocksDB, etc. Xpand and ColumnStore are out of scope as these two have their native way of distributing data across the cluster and don't use replication of any sort.
The base concept behind circular replication is the same as standard replication, and we just have to repeat the steps twice from both sides.
As the above image demonstrates, the architecture. The two nodes of MariaDB replicates from each other. This means the Primary DC is a Primary node in its own right while also a Replica of the Secondary DC node. Same thing applies to the MariaDB server on the Secondary DC. It's a Primary and a Replica of the MariaDB server running in the Primary DC.
There are some specific use cases for this setup. One of them can be described as follows.
- Multi-Regional Application Support
- The two DC nodes are geographically separated, and a fully synchronous replciation is not possible due to the network lag.
- The app is also running locally in each region to support the clients.
- Both the apps in each region connect to the local database to prevent latency and provide the best possible experience.
This is a very common scenario. While there are other options available like Xpand to provide an extremely scalable solution, but an app that just needs a simple architecture, this is a good solution.
- Easy to set up
- Fast asynchronous two way/circular replicaiton
- See the cons section for things to watch out for
- Local region HA can be set up by adding a local Replica nodes
- MaxScale can be used locally in each DC for local HA and automatic failover
- No way to identify the conflicting transaction
- Both sides submit the same transaction. Only one will be committed.
- Not ideal if multi-master writing is a requirement, but still possible thanks to the mixed GTID_DOMAIN_ID setup.
- MaxScale does not support circular replication. Failover between DC must be handled manually.
We are working with two MariaDB servers running 10.5 or higher. For our case, it's MariaDB Enterprise 10.6
Here is what is required at a minimum to set up circular replication.
/etc/my.cnf.d/server.cnf
or /etc/mysql/mariadb.conf.d/50-server.cnf
depending on the OS.
Node-1:
[mariadb]
log_error=server.log
log_bin=mariadb-bin
log_slave_updates=1
gtid_strict_mode=1
binlog_format=row
### Replication Durability
sync_binlog=1
sync_relay_log=1
sync_master_info=1
sync_relay_log_info=1
###
innodb_flush_log_at_trx_commit=1
gtid_domain_id=10
server_id=1000
Node-2:
[mariad]
...
...
gtid_domain_id=20
server_id=2000
Take note of the "Replication Durability" section in the configuration. It will slow down the replication process because of the additional IO done here but is highly recommended.
Node-1 and Node-2 configuration is identical except for the gtid_domain_id
and server_id
. Generally, gtid_domain_id
is kept the same for all the nodes within the same Primary/Replica cluster, but since this is a multi-master circular replication setup, a distinct gtid_domain_id
is mandatory.
Once the configuration setup is done and both the servers have restarted successfully, we can create a replication user rep_user@'%'
with specific replication grants. This is the same user we would create for the standard replication.
Note: This user is created only on Node-1. This will automatically get replicated to Node-2 once the one-way replication is done.
MariaDB [(none)]> create user rep_user@'%' identified by 'VerySecretPassw0rd!';
Query OK, 0 rows affected (0.003 sec)
MariaDB [(none)]> grant replication slave on *.* to rep_user@'%';
Query OK, 0 rows affected (0.003 sec)
Now we are ready for replication setup from Node-1 > Node-2
Validate that Node-1 has a valid GTID since we have already configured binary logging and done some transactions.
Node-1:
MariaDB [(none)]> show global variables like 'GTID%';
+-------------------------+-----------+
| Variable_name | Value |
+-------------------------+-----------+
| gtid_binlog_pos | 10-1000-2 |
| gtid_binlog_state | 10-1000-2 |
| gtid_cleanup_batch_size | 64 |
| gtid_current_pos | 10-1000-2 |
| gtid_domain_id | 10 |
| gtid_ignore_duplicates | OFF |
| gtid_pos_auto_engines | |
| gtid_slave_pos | |
| gtid_strict_mode | ON |
+-------------------------+-----------+
The above shows that the Domain-ID and the GTID values are proper based on the setup in the server.cnf
file. The GTID 10-1000-2
reads as, "This server is at Domain-ID=10, Server-ID=1000 and
Transaction=2"
Validate the same GTID status on the Node-2
MariaDB [(none)]> show global variables like 'GTID%';
+-------------------------+-------+
| Variable_name | Value |
+-------------------------+-------+
| gtid_binlog_pos | |
| gtid_binlog_state | |
| gtid_cleanup_batch_size | 64 |
| gtid_current_pos | |
| gtid_domain_id | 20 |
| gtid_ignore_duplicates | OFF |
| gtid_pos_auto_engines | |
| gtid_slave_pos | |
| gtid_strict_mode | ON |
+-------------------------+-------+
We can see the GTID on the Node-2 is empty, which is fine as we know the server is new and no transactions have been performed. Take note of the GTID_SLAVE_POS
, which is also empty. We would gnerally set this to the GTID position from wherever we want the replication to start. But in this case, "Empty" means it will start to request replication data from the very first transaction logged in the binary logs on the Node-1.
Execute the CHANGE MASTER
on Node-2 just like we would do to set up the standard replication.
Node-2:
MariaDB [(none)]> CHANGE MASTER TO MASTER_HOST="172.31.24.201",
MASTER_USER="rep_user",
MASTER_PASSWORD="VerySecretPassw0rd!",
MASTER_PORT=3306,
MASTER_USE_GTID=slave_pos;
Query OK, 0 rows affected (0.079 sec)
MariaDB [(none)]> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 172.31.24.201
Master_User: rep_user
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mariadb-bin.000001
Read_Master_Log_Pos: 1500
Relay_Log_File: ip-172-31-17-125-relay-bin.000002
Relay_Log_Pos: 1801
Relay_Master_Log_File: mariadb-bin.000001
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
...
Master_Server_Id: 1000
Using_Gtid: Slave_Pos
Gtid_IO_Pos: 10-1000-2
...
...
The above points to Node-1 as the master and confirms that the slave from Node-1 > Node-2 is working well, as we can see the GTID_IO_POS
reports 10-1000-2
. Now let's switch to Node-1 and repeat the same step with a slight twist.
Node-1:
We can now complete the other half of the replication Node-2 > Node-1 by following the same steps but with a slight difference.
Take note of the current GTID status of the Node-1
MariaDB [(none)]> show global variables like 'gtid_binlog_state';
+-------------------+-----------+
| Variable_name | Value |
+-------------------+-----------+
| gtid_binlog_state | 10-1000-2 |
+-------------------+-----------+
1 row in set (0.001 sec)
MariaDB [(none)]> show global variables like 'gtid_slave_pos';
+----------------+-------+
| Variable_name | Value |
+----------------+-------+
| gtid_slave_pos | |
+----------------+-------+
1 row in set (0.001 sec)
We need to set the gtid_slave_pos to be the same as gtid_binlog_state so that replication can start from this point onwards. This is required so that the server does not repeat the same events already in it's local binary logs.
MariaDB [(none)]> SET GLOBAL gtid_slave_pos=@@gtid_binlog_state;
Query OK, 0 rows affected (0.006 sec)
MariaDB [(none)]> show global variables like 'gtid_slave_pos';
+----------------+-----------+
| Variable_name | Value |
+----------------+-----------+
| gtid_slave_pos | 10-1000-2 |
+----------------+-----------+
1 row in set (0.001 sec)
MariaDB [(none)]> CHANGE MASTER TO MASTER_HOST="172.31.17.125", MASTER_USER="rep_user", MASTER_PASSWORD="Password!234", MASTER_USE_GTID=slave_pos;
Query OK, 0 rows affected (0.015 sec)
MariaDB [(none)]> start slave;
Query OK, 0 rows affected (0.006 sec)
This time the MASTER_HOST
points to the Node-2.
Time to check the replication status, but before we do that, let's do a few transactions on the Node-2.
MariaDB [(none)]> create database testdb;
Query OK, 1 row affected (0.002 sec)
MariaDB [(none)]> use testdb;
Database changed
MariaDB [testdb]> create table t(id serial, c varchar(100));
Query OK, 0 rows affected (0.011 sec)
MariaDB [testdb]> show global variables like 'gtid%';
+-------------------------+---------------------+
| Variable_name | Value |
+-------------------------+---------------------+
| gtid_binlog_pos | 10-1000-2,20-2000-2 |
| gtid_binlog_state | 10-1000-2,20-2000-2 |
| gtid_cleanup_batch_size | 64 |
| gtid_current_pos | 10-1000-2,20-2000-2 |
| gtid_domain_id | 20 |
| gtid_ignore_duplicates | OFF |
| gtid_pos_auto_engines | |
| gtid_slave_pos | 10-1000-2,20-2000-2 |
| gtid_strict_mode | ON |
+-------------------------+---------------------+
The above indicates that binary logs have a mix of two domains, which is what we expect to see. Now going back to Node-1 and checking the slave status
MariaDB [(none)]> show status status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 172.31.17.125
Master_User: rep_user
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mariadb-bin.000001
Read_Master_Log_Pos: 1500
Relay_Log_File: ip-172-31-24-201-relay-bin.000002
Relay_Log_Pos: 674
Relay_Master_Log_File: mariadb-bin.000001
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
...
...
Master_Server_Id: 2000
...
Using_Gtid: Slave_Pos
Gtid_IO_Pos: 10-1000-2,20-2000-2
...
...
We can see that Node-1 has replicated the two transactions from the Node-2 20-2000-2
. We can also confirm if the new database and table have been created on this node or not?
MariaDB [(none)]> use testdb;
Database changed
MariaDB [testdb]> show tables;
+------------------+
| Tables_in_testdb |
+------------------+
| t |
+------------------+
Indeed, everything looks good, and we have a circular replication in place. Now repeat some transactions on both nodes and confirm that the replication works both ways without breaking.
References:
.. .. ..