Handle Elasticache upgrade with low downtime #1907

JulesClaussen · 2024-08-20T08:30:39Z

Hey everyone,

We have been using ioredis for a while now, and it works fine except this specific point.
We have an elasticache Redis OSS on AWS, configured in Cluster mode Disabled, but with MultiAZ enabled, and Failover enabled.
We have one primary node, and one replica node.
We are using ioredis 5.2.4.

Client configuration is quite basic, (Typescript, Nest application) the following:

Redis({
    host: env.get('REDIS_HOSTNAME'),
    port: env.get('REDIS_PORT'),
    password: env.get('REDIS_PASSWORD'),
    tls: env.get('REDIS_TLS') === 'true' ? {} : undefined,
    ...(!!dbEnv && isNumber(dbEnv) && { db: parseInt(dbEnv) }),
});

Where REDIS_HOSTNAME is the primary endpoint from AWS.

Whenever we upgrade the Redis (even for minor and release version), we have a 10 minutes unavailability of the Redis. Upgrade takes around 30 minutes all in all, but the Redis is unavailable for around 10 minutes, throwing error such as:

-READONLY You can't write against a read only replica.
    at parseError (/app/node_modules/redis-parser/lib/parser.js:179:12)
    at parseType (/app/node_modules/redis-parser/lib/parser.js:302:14)

We have tried using the reconnectOnError, but without success:

reconnectOnError(err) {
    const targetError = 'READONLY';
    if (err.message.includes(targetError)) {
        return true;
    }
    return false;
},

According to documentation, retryStrategy is supposed to reconnect after a minute, so we haven't tried setting it.

Is there a way to handle this, or is this currently not possible?
Also, is there a way to easily test that? Running a failover manually on AWS console does not reproduce the issue for some reasons. Failover in this specific case happens quickly and application is just failing for about a minute or so.

Cheers,
Jules

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle Elasticache upgrade with low downtime #1907

Handle Elasticache upgrade with low downtime #1907

JulesClaussen commented Aug 20, 2024 •

edited

Loading

Handle Elasticache upgrade with low downtime #1907

Handle Elasticache upgrade with low downtime #1907

Comments

JulesClaussen commented Aug 20, 2024 • edited Loading

JulesClaussen commented Aug 20, 2024 •

edited

Loading