-
Notifications
You must be signed in to change notification settings - Fork 193
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
inter-cluster traffic only working when request and answer pods are running on a gateway node #3159
Comments
Thanks for trying out Submariner @gk-fschubert . A.
How do you check intra-cluster ? are you using B. C. [1]
[2] |
A:
always with B: C:
|
A. Short background on submariner's inter-cluster datapath: Only the egress of inter-cluster datapath is handled by Submariner while ingress is handled by the CNI (Cilium in your case). podA@non_gw_node@cluster1 communication with podB@non_gw_node@cluster2 consists of the following segments :
To support asymmetric routing Submariner RouteAgent sets rp_filter to loose mode for vx-submariner and CNI interface (interface with IP address from pod CIDR, cilium_host in your case) on all nodes. B. C. Will it be possible for you to redeploy the clusters with another CNI (maybe Calico), and recheck subctl verify e2e tests [2] ? [1]
[2] |
And isn't it a problem that the IP address of B.:
Unfortunately no behaviour difference. C.: |
Submariner Endpoint objects publish both a PrivateIP and a PublicIP. The PrivateIP is the IP assigned to an interface on the gateway node where the Endpoint originated. The PublicIP is the source IP for the packets sent from the gateway to the Internet which is discovered by default via services like ipify.org, you can find more details here I can see that in your case privateIP and publicIP are the same [1] for both endpoints, and IPSec tunnel is successfully established between gateway nodes, so no problem here. This appears to be a datapath issue specific to the Submariner and Cilium combination that requires further investigation. We would be happy to add Cilium to the list of Submariner supported CNIs, [1]
|
Alright
We're happy to help and currently have cloned the GitHub resources in our gitlab and are ready to patch it and use that images in our cluster. |
Great! Submariner includes two logical functionalities to support CNI :
I think we should first start with troubleshooting the data-path failures (TCPdump the traffic and check where and why it is being dropped) , once we understand where the problems are and how to solve them, we can plan how to integrate them into Submariner Route Agent. |
what would be your approach to do so? Since submariner uses the scratch docker image after building the resources all package managers are gone. And installing the packages before that and manually copying the dependencies would be a nightmare. |
You can, for example, deploy host-networking daemonset netshoot pods [1] , and use these pods for tcpdumping traffic [1]
|
Hi @gk-fschubert , I created #3168 to track Cilium CNI support , can I also assign you to this topic? |
FYI, I think @gk-fschubert will need to comment on #3168 before we can assign it. |
oh alright, sry. I thought my 'thumbs up' would be enough. Yes you can assign it to me |
The thumbs-up was good for me, but GitHub's permissions mechanically restrict us from assigning people to an issue who are not members of the submariner-io org or participants in the issue. |
in the meantime I've created dedicated clusters. |
Can you elaborate on the connectivity test you performed ? source and dest IP addresses , etc ? |
Sorry. As per the previous test, on the gateway node in cluster To capture the traffic there is a daemonset in both clusters in the namespace Executed test: Working example:Sourcecluster: Destinationcluster: Non working exampleSourceCluster: Destinationcluster: |
@yboaron When a request comes from a pod, running on the gateway node, the traffic on the gateway is not received on the interface 'vx-submariner'. Instead it's arriving at interface eth0. Is that the expected behaviour? (The following catpures are all on the gateway node of 10:32:27.045181 eth0 In IP 242.1.255.254.80 > 242.2.0.1.51950: Flags [F.], seq 854, ack 78, win 509, options [nop,nop,TS val 3058819169 ecr 311128649], length 0
10:32:27.047229 lxc593da811c9d9 Out IP 242.1.255.254.80 > 10.244.0.53.51950: Flags [F.], seq 854, ack 78, win 509, options [nop,nop,TS val 3058819169 ecr 311128649], length 0
10:32:27.047459 lxc593da811c9d9 In IP 10.244.0.53.51950 > 242.1.255.254.80: Flags [.], ack 855, win 501, options [nop,nop,TS val 311128674 ecr 3058819169], length 0 Now if a request comes from a pod, not running on a gateway node, the packets are arriving at the interface 'vx-submariner' and then they are not going out to the interface eth0. 10:34:01.948112 vx-submariner In IP 10.244.0.162.34434 > 242.1.255.254.80: Flags [S], seq 3089478633, win 64240, options [mss 1380,sackOK,TS val 2637690257 ecr 0,nop,wscale 7], length 0
10:34:02.975620 vx-submariner In IP 10.244.0.162.34434 > 242.1.255.254.80: Flags [S], seq 3089478633, win 64240, options [mss 1380,sackOK,TS val 2637691287 ecr 0,nop,wscale 7], length 0
10:34:04.991484 vx-submariner In IP 10.244.0.162.34434 > 242.1.255.254.80: Flags [S], seq 3089478633, win 64240, options [mss 1380,sackOK,TS val 2637693303 ecr 0,nop,wscale 7], length 0
10:34:09.055329 vx-submariner In IP 10.244.0.162.34434 > 242.1.255.254.80: Flags [S], seq 3089478633, win 64240, options [mss 1380,sackOK,TS val 2637697367 ecr 0,nop,wscale 7], length 0 Since I currently don't have the full picture for every component, it would be good to know how it should look like. |
small update. when a pod is running on cluster they got blocked by the FORWARD chain. That can be fixed with: allow packets with source 240.0.0.0/8 or 242.0.0.0/8
allow packets with dst 240.0.0.0/8 or 242.0.0.0/8
Now the SYN packets arrive at the pod running in cluster
Internet connectivityThe other thing I noticed is that currently all pods running on a gateway node, do not have internet connectivity. I've looked up for the ip address of google.de on a other machine and used that in a curl request:
I currently don't know why it's using one of the submariner IP addresses and the interface |
another small update. Therefore it was resolved after changing the nat chain With the other problem help would be highly appreciated. But those SYN-ACK packets are not arriving on the non-gateway node on |
Hello,
|
@jdaln |
@gk-fschubert In our case,
This very strange behavior lead me to check this issue. |
@jdaln |
@gk-fschubert, sorry for the late response. I deployed Submariner on Kind clusters with Cilium as CNI (following https://docs.cilium.io/en/stable/installation/kind/ ). Submariner defines some prepend iptables rules (for example a rule to redirect to SUBMARINER-FORWARD chain in filter/FORWARD) to make sure that inter-cluster traffic is handled by Submariner. I noticed that Cilium verifies that its iptables rules are set as prepend (first rule in the chain), this may explain why the data path between clusters is broken. Will continue to check it |
@yboaron
Hmm but when look at the iptables forward chain, submariner is the first rule in the chain. Or can it be that the eBPF stuff(which i never had contact with before) is verifying it, no matter what's defined in iptables? And do you have an idea of the last bit here? I can see that packets are send out via the correct interface (eth1 is also used when I make a curl request to a resource within the cluster and has nothing to do with submariner) but they just don't arrive. |
I was gonna run the test but our provider made some network changes and, now, I don't even get the DNS answer for the "public" cluster. My setup is a bit special so I will wait the development in this ticket and do the matrix once a few things are fixed. I will keep following the conversation here. |
Yeah, iptables in the logs you attached look fine, no idea why I'm getting different behavior on Kind.
So you run a connectivity test from a non-gateway node on Alice to a gateway pod on Bob. Submariner handles egress traffic while CNI (Cilium) should handle ingress (after IPSec decryption). A. The SYN packet received at GW_node@bob should not be forwarded by Cilium to another node because the dest pod is running on the GW node. B. The SYN-ACK received at GW_node@alice and Cilium should forward it to node_other_thanGW@alice, and the source IP address of the packet is 242.1.255.254. Cilium seems not to forward the packet from GW_node to another node for some reason. Could be because of reverse_path filtering (although Submariner changes rp_filter to '2' for the relevant interfaces) or maybe a security rule that 'doesn't like' the src IP not being from Cluster's Pod CIDR. Can you run a connectivity test from a non-gateway node on Alice to a pod on a non-gateway node on Bob, and check if you get similar result also for the TCP SYN packet (not being received at dest_pod@bob)? Maybe you can change Cilium intra-cluster routing-mode or tunnel-protocol and see if it helps ? |
non-gwnode@alice:
gwnode@alice:
gwnode@bob:
non-gwnode@bob:
So that's interesting. The picture is the same. SYN-ACK are arriving on the gwnode@alice but are not arriving on non-gwnode@alice. Hmmm |
So, TCP-SYN packet is forwarded to non-GW node by cilium on cluster Bob while TCP SYN-ACK is not forwarded to non-GW node on cluster Alice. This could indicate an issue related to connection tracking, since Cilium only handles the TCP SYN-ACK packet on cluster Alice(Submariner handles TCP SYN egress), would it be possible to disable Cilium's connection tracking and see if that helps ? |
I don't think that it's a option But also when I add the switch '--disable-conntrack' to the cilium-agent, the container crashes. I assume that this setting would have to be set on the control plane, where we don't have access to |
A. I'm not familiar with Cilium. You might be able to collect some debugging information using Cilium debug tools on non_gw node@Alice cluster, which might point us to the root cause of the dropped packets. I think you can find useful information in this Cilium issue. B. Do you think setting Cilium C. Once you've gathered all the debugging information from step A, you might consider creating an issue in Cilium as well, Cilium experts may shed some light on this topic. |
Me neither but yes I'm playing around with their debug tools but because eBPF is new to me it needs a bit to be able to write a competent ticket.
Oh yes!! that's really usefull. Thanks! Yesterday evening I had the same assumption based on the cilium monitor and Hubble but I'm in the progress of understanding the maps which are used from our cluster provider which is why I'm not 100% sure if that's also the case in our scenario.
The host firewall is disabled in our clusters:
|
Hi @gk-fschubert , long time ..... Can you check if setting enable-ipv4-masquerade in Cilium's configuration to false helps here? |
@gk-fschubert ,FYI, I submitted this post that covering Submariner deployment on K8S-Kind/Cilium and K8S-Kind/Calico clusters |
I believe that the case here is a bit more complex. Having similar issues, I could see that the NAT traversal of @gk-fschubert seems ON, which is also the case for us because we need it. In the post you linked @yboaron (which is nice), the command uses |
@yboaron
In a quick test it doesn't help but as there are different parties which change the iptables rules without my control, I have to create a submariner version where the things I've discovered are already set (forward drop, sNAT marks) without a manual interaction. @yboaron and @jdaln In general we've setup a self managed k8s clusters using kubespray (https://github.com/kubernetes-sigs/kubespray) |
Yes, it seems that the cases are different, mostly in infrastructure, things like firewall configuration, connection tracking, etc., and further datapath investigation is needed. But the successful deployment of Submariner on Kind with Cilium can serve as a good reference point, where we can compare routes, iptables rules, cilium configuration, etc. As per the NATT, yeah in the Kind setup, Submariner GWs are reachable using their private IPs (node IPs) unlike GW on some public cloud/managed kubernetes service . Submariner NATT configuration/topology is relevant for Submariner tunnels establishment, and in this case it was reported that both Submariner tunnels are up and healthy, and we have some datapath issues when client pod is running on non-GW node. so don't think NATT is the root here. |
What happened:
We have two managed k8s clusters running on DigitalOcean. They use Cilium as Network CNI and both clusters are using the same POD and Service CIDR so the submariner globalnet is used.
To test traffic inter- and intra-cluster, we use a standard nginx container, listening on port 80 and a service to make it accessible.
The service has been exposed via submariner.
What you expected to happen:
Traffic is working no matter if the pods are running on a gateway node or not.
How to reproduce it (as minimally and precisely as possible):
Submariner was installed, using the following commands
Anything else we need to know?:
The determined private IP address of both clusters is their public IP.
cable information CRD:
routes from submariner-route agent pod
jenslab
globalnet cidr: 242.1.0.0/16
on submariner route-agent(on gateway node):
on submariner route-agent(NOT gateway node)
felixlab
globalnet cidr: 242.0.0.0/16
on submariner route-agent(on gateway node):
on submariner route-agent(NOT gateway node)
Environment:
subctl diagnose all
):subctl gather
):submariner-gatherinfos.zip
DigitalOcean
The text was updated successfully, but these errors were encountered: