-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add CALICO_DISABLE_TUNNEL env #9394
Comments
Are you running Typha? The allocate-tunnel-ips program will use typha if enabled, which is our fan-out proxy. Using that removes the load from the API server. |
Yeah, agree that Typha is likely the solution here. If there's a reason that even with Typha this is causing load, that is most likely a bug in the |
Some users don't use Typha to install calico, we still need the env var until we figure out a potential bug in the |
@cyclinder if there truly is a bug in allocateip.go that is causing excessive resource usage, I'd rather we just fix that - that way, everyone benefits and not just those who discover this new arcane environment variable.
@gojoy do you have evidence that allocate-tunnel-addrs was causing load in your cluster? If so, could you provide that evidence so we can use it to diagnose what might be going wrong? |
Yes, we've noticed that each time reconcileTunnelAddrs is executed, the backend service will be called multiple times. When a large number of new nodes join the cluster, it will increase the load on the backend. In the scenario where the tunnel is enabled, this belongs to the normal logic. However, the key point is that when the cluster administrator determines that the tunnel will not be used in the cluster, the question is whether there can be a variable to control stopping this process, since these reconciles responsible for the tunnel have no effect. This may not be in conflict with enabling Typha. |
One more point to add. When using the Kubernetes API Datastore, allocateip.go requests the kubernetes apiserver for queries with resourceVersion="", which will result in not using the cache of Kubernetes and increase the overhead of requests. |
@gojoy I still am not convinced that an environment variable to control this is the correct approach. Felix also watches Nodes(), and will have similar scale characteristics that require the use of Typha at even a moderate number of nodes. I think the correct solution here is to enable Typha in your cluster - it's exactly what it was designed and intended for! |
There is an actual "bug" here, which is that we are performing a Get() directly against the API server for the node instead of using a cached version. Specifically these lines: calico/node/pkg/allocateip/allocateip.go Lines 225 to 235 in 07ad564
I think there is a strong case to be made for improving that to be cache driven instead of generating API calls. |
Got it. We will deploy the Typha component.Thanks |
@caseydavenport That makes sense, let me improve this part, thanks! |
@caseydavenport I found typha is no help for client operations, like |
Expected Behavior
A environment variable is needed to control whether process calico-node -allocate-tunnel-addrs is started when
run the calico pod in k8s.
Current Behavior
Possible Solution
add the a judgment by CALICO_DISABLE_TUNNTL env in the rc.local
Steps to Reproduce (for bugs)
Context
In our scenario, the Kubernetes cluster fully uses the calico BGP mode and no longer starts process calico-node -allocate-tunnel-addrs to reduce the pressure of Calico on the backend
Your Environment
The text was updated successfully, but these errors were encountered: