Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How frequent lib checks the cluster so as to identify possible changes #571

Closed
georgasa opened this issue Jun 14, 2024 · 14 comments
Closed

Comments

@georgasa
Copy link

Hi,

I'm wondering how fast the lib will understand that a master-slave failover occurred so as to redirect the write commands to the new master?

Also during sunny days (normal conditions), does the lib distribute the read commands among master AND replicas or it accesses solely the master nodes for every read operation?

Thanks a lot,
Apostolos

@sewenew
Copy link
Owner

sewenew commented Jun 17, 2024

I'm wondering how fast the lib will understand that a master-slave failover occurred so as to redirect the write commands to the new master?

Once, you send a command to Redis, and it finds that the connection is broke, the lib will try to get the latest master nodes from the cluster, and connect to it.

Also during sunny days (normal conditions), does the lib distribute the read commands among master AND replicas or it accesses solely the master nodes for every read operation?

If you use Redis Sentinel or Redis Cluster, you can configure which role, master or slave, do you want to connect to. If you use slave mode, the lib will randomly pick one node, and send command to it. If the connection is broken, the lib randomly pick another one, maybe the same one, maybe not. Anyway, it ensures all connections in the connection pool connect to the same node.

If you want to distribute the reads among multiple nodes, you have to create multiple Redis instances. If you are working on a backend service application, normally, you have multiple instances of your service application, so the connections should be distributed on multiple nodes.

Regards

@jzkiss
Copy link

jzkiss commented Jun 20, 2024

Hi,

may I have a question connecting to this conversation?

you wrote: "Once, you send a command to Redis, and it finds that the connection is broke, the lib will try to get the latest master nodes from the cluster, and connect to it."

Does it mean that after the connection reestablishment attempt fails, the library sends CLUSTER SLOTS request to update its inner mapping table?

From wireshark traces about a VM restart/reset use case I see that the library detects connection failure in an active connection after 52 second (9 TCP retransmissions) after that there is a reconnection attempt (configurable duration - connect_timeout), and after that CLUSTER SLOTS is sent (2 times, and no more CLUSTER SLOTS request).

Does the "52 sec" interval configurable somehow? I see two candidates in connection.h, socket_timeout and keep_alive_s.

Is it normal that CLUSTER SLOTS is not sent by the library after the connection is established with the new master (that is, CLUSTER SLOTS are sent on demand)?

Thanks in advace,
Jozsef

@sewenew
Copy link
Owner

sewenew commented Jun 25, 2024

Does it mean that after the connection reestablishment attempt fails, the library sends CLUSTER SLOTS request to update its inner mapping table?

NO. It means once the lib finds that the connection is broken, it reconnects to Redis Cluster, and if it successfully connects to it, the client sends a CLUSTER SLOTS request to update local mapping.

When the lib finds the connection is broken? Take the following scenario for example: client fetches a connection from connection pool and sends a command to Redis. However, it fails with the connection broken. Client returns the connection to connection pool. Client fetches the broken connection from the pool, checks the connection status, and finds it's broken.

Does the "52 sec" interval configurable somehow?

NO, there's no such configuration.

Is it normal that CLUSTER SLOTS is not sent by the library after the connection is established with the new master

Once the client finds that the connection is broken, it reconnects and updates its mappings. There're cases that it might not do update immediately. For example, also there's a broken connection in the pool, However, when the client tries to fetch a connection, it does not get the broken one.

By the way, I might add a worker thread to update the mapping from time to time, so that it might update the mapping more quickly.

Regards

@jzkiss
Copy link

jzkiss commented Jun 25, 2024

Hello,
thank you very much for the explanation.

You are using the terms "library" and "client". What is the difference between them? If I understood well your explanation both refer to redis-plus-plus code, and not the application that uses / calls the redis-plus-plus methods.

Note: if you would like to reproduce the scenario then you can apply e.g. the following:

  • start redis cluster e.g. with 3 master and 3 slave
  • generate continous traffic
  • during the traffic, select a master redis instance, apply the following iptables rule in the container/vm of that server:
    iptables -A OUTPUT -p tcp --sport redis_port -s redis_ip -j DROP
  • kill that redis server (kill -9 redis_server_pid)

Expected result: new master is elected, traffic is redirected to that master
Unexpected result: old connection is used, continous TCP packet retransmissions until timeout

Br, Jozsef

@sewenew
Copy link
Owner

sewenew commented Jun 26, 2024

If I understood well your explanation both refer to redis-plus-plus code

Yes

@jzkiss
Copy link

jzkiss commented Jun 26, 2024

Hello,

I would propose the following enhancements (draft algorithm), please consider if it is feasible:

  • introduce new configuration parameter like abs_socket_timeout (default: 0, meaning the feature is not used)
  • introduce a new property per connection: last_response_received
  • whenever a request is sent in a connection, check if (t_now - last_response_received) is under abs_socket_timeout. (Maybe you should also check if there is an ongoing request at all to avoid false failovers) If yes, send the request. If not (~we can assume that the connection suffers), send CLUSTER SLOTS (be careful, choose another connection), check from the response if mastership was changed, and reconnect to the new master if needed. Abort the ongoing requests towards the redis-plus-plus user, redis-plus-plus user should retransmit the request, maybe with some delay.

There are applications where 52 sec delay in responses are unacceptable. Certainly, redis-plus-plus user code can be also enhanced to detect this situation and implement the workaround but it would be more elegant solution to implement this in the library.

Br, Jozsef

@georgasa
Copy link
Author

Also during sunny days (normal conditions), does the lib distribute the read commands among master AND replicas or it accesses solely the master nodes for every read operation?

If you use Redis Sentinel or Redis Cluster, you can configure which role, master or slave, do you want to connect to.

I'm using Redis Cluster and I would like to distribute the read operations among masters AND their replicas (in other words, I would like to offload the masters from getting all the traffic). Is it possible somehow to instruct library to distribute read commands based e.g. on a ratio? Because as I see, replicas don't get traffic so they are underutilized, as masters undertake all the workload.

Thanks

@sewenew
Copy link
Owner

sewenew commented Jul 2, 2024

@georgasa Yes, there was a plan to support customized strategies for choosing between Redis nodes. Before that feature is implemented, you have to do it manually by creating one or more Redis instances that connecting to slave nodes for reading, while creating a Redis instance connecting to master node for writing.

Regards

@sewenew
Copy link
Owner

sewenew commented Sep 21, 2024

@jzkiss With #595 fix, redis-plus-plus updates the slot-node mapping every ClusterOptions::slot_map_refresh_interval (by default, it updates every 10 seconds). You can control the update frequency by this parameter:

ClusterOptions cluster_opts;
cluster_opts.slot_map_refresh_interval = std::chrono::second(5);
auto cluster = RedisCluster(connection_opts, pool_opts, role, cluster_opts);

@sewenew sewenew closed this as completed Sep 21, 2024
@jzkiss
Copy link

jzkiss commented Oct 10, 2024

Hello @sewenew,

thank you very much for the fix and sorry for the late answer!

I have executed two tests:

I. Graceful kill of a Redis master instance
II. TCP Cut of a Redis master instance

These are valid use cases in our deployment, we are using redis modules, and when a new corerction in a module added, in kubernetes we are using rolling upgrades of redis pods (to provide in service operations).
The second test is for simulating when the hosting Virtual Machine is reset in OpenStack.

I. Graceful kill of a Redis master instance - this is Ok with this version.
II. TCP Cut of a Redis master instance - it is much better than the previous one, but I still found issues. The good news is that this version is able to recover the traffic after 26 seconds.

Test setup:
Application uses redispp in Asynch mode. Cluster contains 3 redis masters and 3 redis slaves, application continuously generates redispp async calls, the target masters identified by slots calculated from key by redispp. AddData operation is implemented in redis module.

During the test one redis master is killed, redispp notices it as FIN is received in the TCP stream and RST for the new connection establishment requests.

When redispp recognizes from CLUSTER SLOTS the change in mastership, it connects to new master and uses that channel for data transfer. The existing stream channel towards the other redis master is not broken, it is used throughout the whole test.

greaceful_kill_of_redis_master-graceful_kill_of_redis_master_test

II. TCP Cut of a Redis master instance
In this test we simulate VM reset with IP Tables rules. That means that in TCP level redispp does not recognize the broken link, lower layer retransmits the TCP packets for a while.

In this case I expect that redispp:

  • uses the working tcp streams continously for data transfer
  • when mastership change is detected from CLUSTER SLOTS, redispp establishes new connection and use that one.

But, what is really happening is the following:

  • when mastership change is detected from CLUSTER SLOTS, redispp sends cluster slots without any delay for almost 10 seconds. During this period no data is transferred in the working connections.
  • after ~10 seconds, redispp sends the already buffered requests to redis masters.
  • this period, when buffered requests are sent, is very short. After that, redispp sends cluster slots without any delay, and during this period no data transfer in working stream.
  • finally redispp establishes new connections.

It seems the blocking TCP connection to a master is somehow blocks the usage of the connections on the working streams within redispp.
Another difference between the two tests that in case of the first test the cluster slots burst is shorter (~ 1.18 sec) than the second one (two bursts: ~12 sec, ~5 sec)

I added the recognized exceptions to the figures, maybe it helps to investigate the issue.

tcp-cut-no-restore-tcp_cut_test

We still use the earlier version of redispp, because we have a workaround in application level: when 4 sec timeout is detected then the application reinitializes the redispp. This workaround does not work with the new redispp (it crashes after some seconds after the reinitializations). With the workaround we lost 219 transactions, with the new version (and without the workaround) we lost 1707 (traffic was 250 transactions per second).

Unfortunately I cannot send you the wireshark traces / logs because they contain sensitive data, but please feel free to ask me if something is not clear from the figures.

From TCP traces level, this is the difference:

tcp cut:
tcp-cut-capture

graceful kill:
graceful-capture

Note:
I noticed that redispp sends cluster slots without delays when it detects failure situation. You may consider to add some short delay between these requests to save CPU resources.

Br, Jozsef

@sewenew
Copy link
Owner

sewenew commented Oct 15, 2024

@jzkiss Sorry for the late reply, and thank you for your detailed info.

Please help to clarify a few questions:

when mastership change is detected from CLUSTER SLOTS, redispp sends cluster slots without any delay for almost 10 seconds. During this period no data is transferred in the working connections.

Do you mean that the new master has already been elected? Or this happens during the election?

Looks like that redis-plus-plus continuously gets moved error or io error, otherwise, it won't continuously send CLUSTER SLOTS command.

connection establishment to new master. after this no data sent on tcp stream. 3208 exception during this period. ... no data sent on stream. conn closed by redispp

This is strange. If the cluster is healthy, i.e. new master elected, and redis-plus-plus connects to new master successfully, it should not continuously send cluster slots command.

Does 3208 is the new master? or some other node. It's unclear which one or all of these Redis nodes returns timeout error.

NOTE: by default, Redis Cluster is configured as cluster-require-full-coverage yes. In this case, if some node is down, and the new master is not elected, Redis Cluster won't allow new commands. I'm not sure, if that's why redis-plus-plus closed the connection.

This workaround does not work with the new redispp (it crashes after some seconds after the reinitializations)

Is the crash caused by redis-plus-plus? In that case, if you can give a minimum code snippets that can reproduce the problem, I can do some debugging.

Also, is there any easy way to simulate the tcp cut case? I'm not familiar with network configuration, and cannot reproduce the problem. Thanks again!

Regards

@jzkiss
Copy link

jzkiss commented Oct 15, 2024

Hello @sewenew ,

thank you for checking the issue. I try to answer to your questions below.

Br, Jozsef

Do you mean that the new master has already been elected? Or this happens during the election?

The new master was already elected. This is the CLUSTER SLOTS response for the first inquiry:

*2
$7
CLUSTER
$5
SLOTS
*3
*3
:0
:5460
*4
$10
172.17.0.3
:6383
$40
d33e5ccd4c05a85add2acfe154b423288878b5ca
%0
*4
:5461
:10922
*4
$10
172.17.0.3
:6381
$40
258287dcf0a302cad623b0b0e3019e91ad8219c0
%0
*4
$10
172.17.0.3
:6384
$40
6ee610909c99fc81fece7807173201685c4649e9
%0
*4
:10923
:16383
*4
$10
172.17.0.3
:6382
$40
3e60a0f44bd959530468fe8300f56361c7628c40
%0
*4
$10
172.17.0.3
:6385
$40
f5434bef52222cdb6463c8e34cb1e7ac007eb438
%0

Slot 0-5460 is served by only 172.17.0.3:6383 (new master), the other two slot ranges are served by master / slave.

Does 3208 is the new master? or some other node.

Based on the slot information that is calculated from the keys all of the masters affected.
(This is the first 20 failed slots from 3208: 3959,11755,11367,1179,1055,6120,10123,9999,4762,575,5010,947,8495,12558,8355,12418,4567,502,4435,370, ...)
And this is in sync with the fact that request is not delivered any of the redis masters for a while.
[By the way, it would be useful if redis-plus-plus would expose the API that I could use to get slot number from std::string key - I could add this information to exception printout]

NOTE: by default, Redis Cluster is configured as cluster-require-full-coverage yes. In this case, if some node is down, and the new master is not elected, Redis Cluster won't allow new commands. I'm not sure, if that's why redis-plus-plus closed the connection.

These are the relevant settings in the test setup:
save ""
protected-mode no
cluster-enabled yes
cluster-node-timeout 3000
cluster-require-full-coverage no
slowlog-log-slower-than 5
loglevel debug
appendonly no
replica-serve-stale-data no
repl-diskless-sync yes
repl-diskless-sync-delay 0
repl-diskless-load on-empty-db
repl-disable-tcp-nodelay yes
shutdown-timeout 5

Is the crash caused by redis-plus-plus? In that case, if you can give a minimum code snippets that can reproduce the problem, I can do some debugging.

I try to reproduce core file and look into it with gdb, but it requires time. I think I met similar situation that #578

Also, is there any easy way to simulate the tcp cut case?
Yes, sure. I use IPTABLES with the following script:

_#!/bin/bash

REDIS_PORT=$1
REDIS_PID=$2

echo "iptables -A OUTPUT -p tcp --sport ${REDIS_PORT} -s 172.17.0.3 -j DROP"
iptables -A OUTPUT -p tcp --sport ${REDIS_PORT} -s 172.17.0.3 -j DROP

sleep 2

echo "kill -9 ${REDIS_PID}"
kill -9 ${REDIS_PID}_

In my test ${REDIS_PORT} is 6380

@mike1821
Copy link

Hello,

We face the same issue in our client implementation. We are also using an AsyncRedisCluster object and send commands using redis modules. The cluster is using three master/slave nodes.

In case it helps, I would like to mention that in our case the problem mainly appears if we ungratefully kill the node in which the library initial connection was established.

@sewenew
Copy link
Owner

sewenew commented Oct 21, 2024

@jzkiss I tried your setup with the script you given, however, I cannot reproduce your problem. Please check the following steps:

  1. create a redis cluster with redis' create-cluster util, which runs 3 masters and 3 replicas on port 3001 - 3006. Nodes in the cluster is configured as cluster-require-full-coverage no, as you mentioned in the comment.
  2. run the following code to read from all masters continuously. Key a, b and c locate on different masters: a -> port 30003, b -> 30001, c -> 30002. This ensures that our application sending commands to all nodes.
function<void (Future<Optional<string>> &&)> callback(const string &key) {
        return [key](Future<Optional<string>> &&fut) {
                try {
                        auto v = fut.get();
                        cout << key << ": get ok: " << *v << endl;
                } catch (const Error &err) {
                        cout << key << ": get err: " << err.what() << endl;
                }
        };
}

int main() {
        ConnectionOptions opts;
        opts.host = "127.0.0.1";
        opts.port = 30001;
        opts.connect_timeout = chrono::milliseconds(100);
        opts.socket_timeout = chrono::milliseconds(100);
        ConnectionPoolOptions pool_opts;
        pool_opts.size = 3;
        auto r = AsyncRedisCluster(opts, pool_opts);
        r.set("a", "a").get();
        r.set("b", "b").get();
        r.set("c", "c").get();
        while (true) {
                try {
                        r.get("b", callback("b"));
                        r.get("c", callback("c"));
                        r.get("a", callback("a"));
                        this_thread::sleep_for(chrono::seconds(1));
                } catch (const Error &e) {
                        cerr << e.what() << endl;
                }
        }
        return 0;
}
  1. Do tcp cut with your script to kill the master that key a located (in my case, this node listens on port 30003), and wait until the cluster recovers.

The following is the output of the test code:

a: get ok: a
c: get ok: c
b: get ok: b
c: get ok: c
a: get ok: a
b: get ok: b
a: get ok: a
... omit duplicated messages...
b: get ok: b
c: get ok: c
b: get ok: b
a: get err: null reply: Timeout, err: 6, errno: 115
c: get ok: c
b: get ok: b
a: get err: failed to connect to Redis (127.0.0.1:30003): Timeout, err: 6, errno: 115
b: get ok: b
c: get ok: c
a: get err: failed to connect to Redis (127.0.0.1:30003): Timeout, err: 6, errno: 115
c: get ok: c
b: get ok: b
a: get err: failed to connect to Redis (127.0.0.1:30003): Timeout, err: 6, errno: 115
b: get ok: b
c: get ok: c
a: get err: failed to connect to Redis (127.0.0.1:30003): Timeout, err: 6, errno: 115
c: get ok: c
b: get ok: b
a: get err: failed to connect to Redis (127.0.0.1:30003): Timeout, err: 6, errno: 115
c: get ok: c
b: get ok: b
a: get ok: a
... omit duplicate messages...

As you can see from the message, when the node (port 30003) is cut off, commands sent to the node got timeout exception, which is the expected behavior, since after some tcp retransmission, it got the timeout error. However, other 2 nodes (port 30001 and port 30002) does not affected by this operation (commands on key b and c get reply and the callback runs).

Based on the slot information that is calculated from the keys all of the masters affected.

I did not observed this behavior. The master nodes that do not affected by the cut off node.

I try to reproduce core file and look into it with gdb, but it requires time. I think I met similar situation that #578

I'll try to reproduce this problem.

Regards

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants