-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rate-limiting error Redis timeout #13473
Comments
Could you use a Redis client tool to check if Redis is operational during the time you're benchmarking Kong? Based on the information you provided, we can't determine where the timeout issue is occurring—it could be on the Kong side or the Redis side. ( BTW, providing the full text of the Redis error log is better for the community to debug. Your screenshot cuts off part of one line and is incomplete |
Please note that to effectively benchmark Kong, you need to identify and reach a bottleneck in your system. Only then can you obtain meaningful results, such as QPS from Kong. We'd also like to know what the bottleneck is during your benchmark—it could be 100% CPU usage by Kong, full network card utilization, or 100% CPU usage by Redis. |
I believe it's impossible to overload the Redis instance with several hundred RPS. Perhaps you should check the number of workers in your Kong instance as @chobits mentioned. |
You mean, when you benchmarked Kong, it experienced Redis timeouts and only 15% CPU usage of Kong worker processes? If that's the case, I guess something might be stuck inside Kong. Did you see any other error logs besides the Redis timeout? For example, any logs related to timer-ng, DNS, connection timeouts, etc.?
I think this should be the last thing to consider. In my experience, I haven't encountered any bugs related to kernel epoll/events.
We don't know, because currently we don't locate the root cause of this situation. |
And for the error log in your Screenshot, like |
More debug logs are listed below: |
So what's the your custom debug code of this kind of debug log And also note that we could not pinpoint the reason of the redis timeout although you show the timeout error there and the source code. If I were you, I may add more debug log/error log or use stapxx tool to trace the stack of redis timeout. It looks like an interesting journey to debug this. Also note that the tcpdump info your provided is 100% associated to the timeout situation, if it is yes, then you can debug into kong souce code (redis lua) or openresty lua-nginx-module tcp:sock. |
The following is a debug log that normally does not report a timeout: 2024/08/14 18:08:38 [debug] 7021#7021: *201070 lua access handler, uri:"/test_kylin_upstream_timeout" c:1 |
The error in the nginx code: 2024/08/20 19:55:42 [debug] 7019#7019: *1338394 http run request: "/test_kylin_upstream_timeout?" |
nginx source code
|
I don’t think 54ms is excessively long. During one cycle of epoll events processing (ngx_process_events), there aren't any special I/O operations happening aside from writing debug logs. Therefore, it seems likely that the 54ms is primarily consumed by CPU-intensive operations and the time taken for debug log writing. |
From your debug log
These two error log entries are puzzling, one indicated a timeout, while the other reports a latency of only 55ms. In my opinion, 55ms is not a significant latency. First, it’s best to capture a situation where the delay exceeds some seconds (might be > 1s, which indicated a significat timeout). Make sure to have the corresponding tcpdump packets and debug log information, especially noting that the debug log should cover from when the request is sent from the Redis connection to when the response is received (or the timeout occurs). Then, you can analyze the timeout phenomenon by combining the tcpdump packet capture and the debug logs. |
@chobits We checked with our colleagues in the kernel group and determined that the 50ms delay was in user mode, that is, in lua+nginx. We need to continue to see where the card was stuck in user mode and whether the lua or nginx parameters can be adjusted. |
@chobits Looking at the basic metrics of the container, we can see that the redis timeout will be triggered when the number of net.sockets.tp.inuse connections on the machine reaches 6000. |
This issue is marked as stale because it has been open for 14 days with no activity. |
I think we need to know what these 6000 connections are, like using command
And for this, it might need you to use some debug tool, like systap/gdb script, to insert hook to some C function to see why there is a delay occuring in user mode(kong+lua). For nginx/kong/lua, 50ms pure computing is rare and easy to find ( like the metric of 100% cpu usage) , I guess there might be some IO waiting operation. |
This issue is marked as stale because it has been open for 14 days with no activity. |
Dear contributor, We are automatically closing this issue because it has not seen any activity for three weeks. Your contribution is greatly appreciated! Please have a look Sincerely, |
Is there an existing issue for this?
Kong version (
$ kong version
)3.4.0
Current Behavior
Kong uses Redis to limit the service flow,
network topology: kong is deployed in elastic cloud containers, and redis cluster is a closed-loop call in the same computer room, and does not involve cross-room and cross-dedicated lines
rate-limiting plugin
config = {"redis_database":0,"policy":"redis","redis_host":"xx.xx.xx.xx","redis_timeout":50,"limit_by":"server","second":500,"redis_port":xx,"redis_password":"","fault_tolerant":true}
redis cluster configuration: 12G memory, 3 proxies, 12 redis, of which redis is one master and one slave, that is, 6 groups of redis
Kong Machine specification 4C8G, The container environment only has test traffic, without any other requests, and uses the wrk tool to send requests in parallel
Expected Behavior
kong can read redis responses normally instead of timeout
Steps To Reproduce
redis cluster configuration: 12G memory, 3 proxies, 12 redis, of which redis is one master and one slave, that is, 6 groups of redis
Kong Machine specification 4C8G, The container environment only has test traffic, without any other requests, and uses the wrk tool to send requests in parallel
Anything else?
no
The text was updated successfully, but these errors were encountered: