Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize RedisClusterClient connect operations #306

Closed
norbald opened this issue Aug 23, 2021 · 2 comments
Closed

Optimize RedisClusterClient connect operations #306

norbald opened this issue Aug 23, 2021 · 2 comments

Comments

@norbald
Copy link

norbald commented Aug 23, 2021

Describe the feature

RedisClusterClient has poor performance in high concurrency scenarios.
Beacuase RedisClusterClient sends CLUSTER SLOTS to endpoints[0] for slots query every time it call connect()

This will have the following effects:

  1. The CLUSTER SLOTS command is issued for each connection to increase the CPU load of the endpoints[0] . In my case, the conventional query command is only 12%-16% cpu usage, the endpoints[0] is slave node with 93% cpu usage. you can see cpu load from following image.
    image

  2. if endpoints[0] was a slave node and it was offline, and the redis cluster is working normally, every connection would take 1000ms+(RETRIES = 16 by default) for slots query,

The specific code is in io.vertx.redis.client.impl.RedisClusterClient.java:126

  @Override
  public Future<RedisConnection> connect() {
    final Promise<RedisConnection> promise = vertx.promise();
    // attempt to load the slots from the first good endpoint
    connect(options.getEndpoints(), 0, promise);
    return promise.future();
  }

Here are my endpoints[0] commandstats:

127.0.0.1:7002> info commandstats
# Commandstats
cmdstat_del:calls=2649,usec=6231,usec_per_call=2.35
cmdstat_pexpire:calls=2,usec=24,usec_per_call=12.00
cmdstat_auth:calls=114530,usec=303946,usec_per_call=2.65
cmdstat_readwrite:calls=1,usec=0,usec_per_call=0.00
cmdstat_ttl:calls=5,usec=17,usec_per_call=3.40
cmdstat_client:calls=216871,usec=37317937,usec_per_call=172.07
cmdstat_psync:calls=5,usec=10677,usec_per_call=2135.40
cmdstat_command:calls=1,usec=3367,usec_per_call=3367.00
cmdstat_info:calls=7929,usec=1100766,usec_per_call=138.83
cmdstat_readonly:calls=23,usec=7,usec_per_call=0.30
cmdstat_scan:calls=9,usec=8936,usec_per_call=992.89
cmdstat_set:calls=2,usec=124,usec_per_call=62.00
cmdstat_config:calls=9,usec=352,usec_per_call=39.11
cmdstat_ping:calls=43,usec=23,usec_per_call=0.53
cmdstat_replconf:calls=60334,usec=136349,usec_per_call=2.26
cmdstat_incr:calls=757791,usec=931333,usec_per_call=1.23
cmdstat_type:calls=5,usec=36,usec_per_call=7.20
cmdstat_smembers:calls=117,usec=377,usec_per_call=3.22
cmdstat_get:calls=617071,usec=1054541,usec_per_call=1.71
cmdstat_cluster:calls=392030,usec=59409141,usec_per_call=151.54

As you can see, CLUSTER SLOTS is an expensive operation. Is it necessary to call CLUSTER SLOTS each connection?

English is not my mother tongue, I hope I have expressed my views clearly .
Finally, I hope vertx-redis-client will be more reliable in production environment.

Use cases

Make a Vert.x HTTP Server with Redis Cluster and vertx-redis-client,
do some redis operation pressure measurement like get fooincr count with Apache JMeter.

@pmlopes
Copy link
Member

pmlopes commented Aug 25, 2021

Hi @norbald, I think this is a usability issue. We should document that when using vert.x in cluster mode (cluster client) users, should manage their connection. The expectation is that a user would call connect() once and keep the connection alive for a long period, exactly to avoid the issues you're observing.

A second point is that, probably, we should do some random shuffle of the configuration nodes to avoid overloading the “slot” call to always start on the 1st entry, which will put way more load on a single node than the rest.

This could be related to #303 as we could create a wrapper that “caches” the connections and in this case we could avoid calling the slots command all the time.

@Ladicek
Copy link
Contributor

Ladicek commented Jul 25, 2024

This was fixed in #405 / #407 (and also #412).

@Ladicek Ladicek closed this as completed Jul 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

3 participants