-
Notifications
You must be signed in to change notification settings - Fork 39.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kube-proxy should exempt local traffic from going through conntrack #109672
Comments
To clarify, the local traffic (localhost to localhost) would firstly go through OUTPUT chain and then INPUT chain. Is it right? It doesn't matter if it only goes to OUTPUT chain because OUTPUT chain also uses conntrack.
The issue is when the conntrack table is full the localhost health check against kubelet is failed. However, this doesn't really reflect the true problem. |
You should report first the bug, I don't think this is going to solve any problems if you have the conntrack table full, is just one connection |
IMHO, this is expected as conntrack table can't be infinite. The applications can exhaust it no matter how big it is. The point is that ideally conntrack table being full should not impact kubelet (and its health check). |
/sig network |
I understand your reasoning, but if you have your conntrack table full, the kubelet probe is going to be the minor of your problems |
I agree. My understanding on this problem is:
|
hmm, kube-proxy rules are not creating the conntrack entries, those rules are using the entries ... the kernel enables the conntrack hooks and tracks the protocols, all the NAT kernel modules use those entries to be able to perform the different operations ... If you want some traffic to not be "tracked" by the kernel you have to use something like
but if you hit conntrack table limits I think that your system is wrong dimensioned, same as if you have memory, cpu or disk exhausted, you should be able to know the number of conntrack entries you want to handle and configure the limit correctly in advance |
I'm in favor of adding NOTRACK rules if and when we can. The problem is that it's not always clear when we can do that. E.g. iptables mod kube-proxy supports nodeports on localhost - we need to track those. So the rules have to at least check I seem to recall trying to do this a while back and finding more corner cases. TCP connections do clear after the connection closes, maybe there's more we can do there. Can you run |
@aojea ah, I missed that the conntrack is enabled as long as iptables/netfilter is used. Thanks! By exempting, I meant to set up the NOTRACK rules for critical traffic so the components relying on those traffic can be more robust for extreme circumstance. However, it's debatable where we should put those NOTRACK rules. @thockin I don't have the environment now but I'd bet the most usage comes from pods instead of the localhost. So the small traffic through localhost is a victim of conntrack table being full. But they are very important to critical K8S components in the node. |
@linxiulei you should be able to adjust the conntrack table size with |
kube-proxy sets those sysctls but has a scaling config you can set, if you really need more. As I mentioned above - I do think it would be neat to NOTRACK things when we know we can, it's just harder than it sounded (or I just failed the first time I tried :) |
@dcbw Not really as nf_conntrack_max may be used up no matter how high you specify. Also setting a high value of it will use up memory that is not charged by any pods, so it will impact the whole system. @thockin I have created NOTRACK iptables rules in my environment and it worked well for local traffic (eg. health check to kubelet). But it is a standalone configuration script that insert the rules. It'd be neat if kube-proxy does that by default. |
Can you provide details on exactly what NOTRACK rules you are using? |
Yeah, something like:
The 10248 is the port that kubelet's healthz service listens to |
Traffic to localhost is tracked in some cases (nodeports in iptables mode). If you chjange kube-proxy to do the above it should fail e2e tests. :( |
I'm going to triage accept this, because I agree, but it's not as easy as it sounds. :( |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
Just for reference as Eric and I were finding new corner cases with the notrack approach, and something important to consider is that if there is some firewalling in place that depends on the connection state, if the connection is no tracked, the returning traffic will not match any ESTABLISHED state, and will be dropped |
This issue has not been updated in over 1 year, and should be re-triaged. You can:
For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/ /remove-triage accepted |
Closing in favor of #127259 as the accumulator issue. |
What would you like to be added?
An accept rule should be installed before the conntrack rule exempt local traffic such as requests to localhost:10248/healthz
Why is this needed?
Tracking traffic via loop interface seems unnecessary and has extra overhead. Especially when the conntrack table is full, the traffic would be dropped and it causes more problems.
The text was updated successfully, but these errors were encountered: