Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

None recovering Redis command timed out #3010

Open
mshahmoradi87 opened this issue Oct 11, 2024 · 0 comments
Open

None recovering Redis command timed out #3010

mshahmoradi87 opened this issue Oct 11, 2024 · 0 comments
Labels
for: team-attention An issue we need to discuss as a team to make progress status: waiting-for-triage

Comments

@mshahmoradi87
Copy link

Bug Report

One container starts to have "Redis command timed out" and it does not recover.

Current Behavior

we have started to use aws redis serveless, since the switch from redis cluster-mode enabled to redis serverless, out of the blue one client from many tasks running starts to have command timed out continuously. Depending on the service and size of objects and if put or get commands are being timed out, we see different side effects, sometimes it results in #705, sometimes only increased latency.

Note that in the services this happens, usually there are more than 12 tasks running, and only one container has this issue.

I could correlate any CPU spike with the event causing the timeouts.

org.springframework.dao.QueryTimeoutException: Redis command timed out; nested exception is io.lettuce.core.RedisCommandTimeoutException: Command timed out after 5 second(s)
	at org.springframework.data.redis.connection.lettuce.LettuceExceptionConverter.convert(LettuceExceptionConverter.java:70)
	at org.springframework.data.redis.connection.lettuce.LettuceExceptionConverter.convert(LettuceExceptionConverter.java:41)
	at org.springframework.data.redis.PassThroughExceptionTranslationStrategy.translate(PassThroughExceptionTranslationStrategy.java:44)
	at org.springframework.data.redis.FallbackExceptionTranslationStrategy.translate(FallbackExceptionTranslationStrategy.java:42)
	at org.springframework.data.redis.connection.lettuce.LettuceConnection.convertLettuceAccessException(LettuceConnection.java:277)
	at org.springframework.data.redis.connection.lettuce.LettuceConnection.await(LettuceConnection.java:1085)
	at org.springframework.data.redis.connection.lettuce.LettuceConnection.lambda$doInvoke$4(LettuceConnection.java:938)
	at org.springframework.data.redis.connection.lettuce.LettuceInvoker$Synchronizer.invoke(LettuceInvoker.java:665)
	at org.springframework.data.redis.connection.lettuce.LettuceInvoker.just(LettuceInvoker.java:125)
	at org.springframework.data.redis.connection.lettuce.LettuceHashCommands.hSet(LettuceHashCommands.java:61)
	at org.springframework.data.redis.connection.DefaultedRedisConnection.hSet(DefaultedRedisConnection.java:1332)
	at org.springframework.data.redis.connection.DefaultStringRedisConnection.hSet(DefaultStringRedisConnection.java:631)
	at org.springframework.data.redis.core.DefaultHashOperations.lambda$put$14(DefaultHashOperations.java:254)
	at org.springframework.data.redis.core.RedisTemplate.execute(RedisTemplate.java:224)
	at org.springframework.data.redis.core.RedisTemplate.execute(RedisTemplate.java:191)
	at org.springframework.data.redis.core.AbstractOperations.execute(AbstractOperations.java:97)
	at org.springframework.data.redis.core.DefaultHashOperations.put(DefaultHashOperations.java:253)



	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.base/java.lang.Thread.run(Unknown Source)
Caused by: io.lettuce.core.RedisCommandTimeoutException: Command timed out after 5 second(s)
	at io.lettuce.core.internal.ExceptionFactory.createTimeoutException(ExceptionFactory.java:59)
	at io.lettuce.core.internal.Futures.awaitOrCancel(Futures.java:246)
	at io.lettuce.core.LettuceFutures.awaitOrCancel(LettuceFutures.java:74)
	at org.springframework.data.redis.connection.lettuce.LettuceConnection.await(LettuceConnection.java:1083)
	... 22 common frames omitted`

Expected behavior/code

the expected behaviour is that with periodic_refresh, the issue does not persist.

Environment

aws redis serverless 7.1
<lettuce.version>6.3.2.RELEASE</lettuce.version>
<netty.version>4.1.109.Final</netty.version>

related configurations:
ssl: true
timeout: 5s
connect-timeout: 500ms
additional:
dns-ttl: 5s
periodic-refresh: 10s
reconnect-delay-min: 100ms
reconnect-delay-max: 5s
read-from: replicaPreferred

Possible Solution

RedisCommandTimeoutException intermittently:
in some cases this could be related, as we sometimes see timeouts for a minute, and they are recovered, however in this case it persists.

Lettuce cannot recover from connection problems:
maybe setting TCP_USER_TIMEOUT would help, default is 1 minutes, so for the recovering case, maybe this helps, but for the cases that this continues and gets stacked probably not.

@tishun tishun added for: team-attention An issue we need to discuss as a team to make progress status: waiting-for-triage labels Oct 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
for: team-attention An issue we need to discuss as a team to make progress status: waiting-for-triage
Projects
None yet
Development

No branches or pull requests

2 participants