Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gray/sk assign #3

Draft
wants to merge 8 commits into
base: main
Choose a base branch
from
Draft

Gray/sk assign #3

wants to merge 8 commits into from

Conversation

jschwinger233
Copy link
Owner

Background

Checklist

Full Changelogs

  • [Implement ...]

Issue Reference

Closes #[issue number]

Test Result

@jschwinger233 jschwinger233 force-pushed the gray/sk_assign branch 8 times, most recently from f016441 to ac3642f Compare January 1, 2024 05:19
As we are going to implement tproxy hijack via bpf_sk_assign, tproxy
response won't reach wan iface at all, unless wan iface == lan iface.

The only remaining "tproxy_response" is the place returning TC_ACT_PIPE
to hand packets over from tproxy_wan_egress to tproxy_lan_egress.

This commit also deletes rev-NAT logic for tproxy response.

This commit tries to make a minimum change, otherwise file diff is
too confusing to reviewers. I'll clean it up in the next patch.
This commit merely removes the `if (false)` branch at:

```
if (false) {
  // Comments
} else {
  ...
}
```

The file diff becomes completely messed up, so I split it into a
separate patch without any functional change.
Note the necessity of separation of `assign_socket_tcp` and
`assign_socket_udp`:

As `struct bpf_sock *` has different verifier types for tcp and udp, the
code below can't pass verifier:

```
static __always_inline int
assign_socket(struct __sk_buff *skb, struct bpf_sock_tuple *tuple, __u32 len,
	      __u8 nexthdr) {
	struct bpf_sock *sk;
	switch (nexthdr) {
	case IPPROTO_TCP:
		sk = bpf_sk_lookup_tcp(skb, tuple, len, BPF_F_CURRENT_NETNS, 0);
	case IPPROTO_UDP:
		sk = bpf_sk_lookup_udp(skb, tuple, len, BPF_F_CURRENT_NETNS, 0);
	}
	if (!sk) {
		return -1;
	}

	int res = bpf_sk_assign(skb, sk, 0);
	bpf_sk_release(sk);
	return res;
}
```
We no longer need tcp_dst_map for NAT. Relevant Golang logic is also
removed.

One thing need to mention is "dst_routing_result" struct. Although
tcp_dst_map is gone, dst_routing_result struct is still in use under
userspace at https://github.com/daeuniverse/dae/blob/cab1e4290967340923d7d5ca52b80f781711c18e/control/udp.go#L69C17-L69C17.
Therefore, this commit remains this struct and make some efforts to
ensure bpf objects are compiled with it.
Previously, wan_egress has to encap UDP packets with routing info, but
it's no more necessary as we are in favor of bpf_sk_assign without NAT.
This is a must-have, otherwise packets being bpf_sk_assigned and routed
to local on wan will be dropped by kernel during fib_lookup:

```
// https://github.com/torvalds/linux/blob/v6.5/net/ipv4/fib_frontend.c#L381
static int __fib_validate_source()
...
	if (res.type != RTN_UNICAST &&
	    (res.type != RTN_LOCAL || !IN_DEV_ACCEPT_LOCAL(idev)))
		goto e_inval;
...
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant