Populate client side trace's local address via tcp kprobes #1989

ddelnano · 2024-08-23T18:37:48Z

Summary: Populate client side trace's local address via tcp kprobes

This change populates client side trace's local_addr and local_port columns for the following use cases:

To provide more consistency for the protocol data tables. Having columns that are empty make it difficult for end users to understand what is being traced and make them less useful
To facilitate addressing a portion of the short lived process problems (Pixie Misses Events from Short Lived Processes/Pods #1638)

For 2, the root of the issue is that df.ctx["pod"] syntax relies on the px.upid_to_pod_name function. If a PEM misses the short lived process during its metadata update, this function fails to resolve the pod name. For client side traces where the pod is making an outbound connection (non localhost), the local_addr column provides an alternative pod name lookup for short lived processes when the pod is long lived. This means the following would be equivalent to the df.ctx["pod"] lookup: px.pod_id_to_pod_name(px.ip_to_pod_id(df.local_addr)).

I intend to follow this PR with a compiler change that will make df.ctx["pod"] try both methods should px.upid_to_pod_name fail to resolve. This will allow the existing pxl scripts to display the previously missed short lived processes.

Alternatives

Another approach I considered was expanding our use of the sock_alloc kprobe. I used ftrace on a simple curl command to see what other options could be used (sudo trace-cmd record -F -p function_graph http://google.com). The socket syscall calls sock_alloc, which would be another mechanism for accessing the struct sock. I decided against this approach because I don't think its viable to assume that the same thread/process that calls socket will be the one that does the later syscalls (how our BPF maps are set up). It's common to have a forking web server model, which means a different process/thread can call socket than the ones that later read/write to it.

Probe stability

These probes appear to be stable from our oldest and newest supported kernel. These functions exist in the tcp_prot, tcpv6_prot structs and I've seen that other projects and bcc tools use these probes. This makes me believe that these functions have a pretty well defined interface.

Relevant Issues: #1829, #1638

Type of change: /kind feature

Test Plan: New tests verify that ipv4 and ipv6 cases work

Ran for i in $(seq 0 1000); do curl http://google.com/$i; sleep 2; done within a pod and verified that local_addr is populated with this change and px.pod_id_to_pod_name(px.ip_to_pod_id(df.local_addr)) works for pod name resolution.
Verified the above curl test results in traces without local_addr without this change
Tested on the following k8s offerings and machine images
GKE COS and Ubuntu
EKS Amazon Linux 2

Changelog Message: Populate socket tracer data table local_addr and local_port column for client side traces.

src/stirling/source_connectors/socket_tracer/bcc_bpf/socket_trace.c

ddelnano · 2024-08-23T22:03:12Z

src/stirling/source_connectors/socket_tracer/socket_trace_connector.cc

+ {"tcp_sendmsg", ProbeType::kEntry, "probe_entry_populate_active_connect_sock",
+ /*is_syscall*/ false},
+ {"tcp_sendmsg", ProbeType::kReturn, "probe_ret_populate_active_connect_sock",
+ /*is_syscall*/ false},


Since we don't have an easy mechanism for testing this, I tried to test this mid stream case by creating a psql shell prior to the PEM starting and issuing queries after the socket tracer was initialized. I verified that a tcp connection was made and only sendto syscalls were issued when queries were executed (no connect syscalls) to check that this would simulate a mid stream case.

Surprisingly, I saw that the local_addr column was populated before I added the probe on tcp_sendmsg despite my thinking that it should be empty. I went ahead and added this probe since I verified with ftrace on the same psql process that tcp_v4_connect isn't called. Just raising this since I wasn't able to verify that this additional probe adds extra coverage from the testing I did.

ddelnano · 2024-08-26T16:15:30Z

@oazizi000 would appreciate if you could give this a review. I still need to fix the bpf instruction issue for the 4.x kernels and the incorrect storage of the local port (it is stored in network byte order and is an existing issue), but that shouldn't impact the core details in the PR.

ddelnano · 2024-08-27T16:13:33Z

src/stirling/source_connectors/socket_tracer/socket_trace_connector.cc

@@ -176,7 +176,7 @@ DEFINE_bool(
 stirling_debug_tls_sources, gflags::BoolFromEnv("PX_DEBUG_TLS_SOURCES", false),
 "If true, stirling will add additional prometheus metrics regarding the traced tls sources");

-DEFINE_uint32(stirling_bpf_loop_limit, 42,
+DEFINE_uint32(stirling_bpf_loop_limit, 41,


The iovec syscall return probes that rely on process_implicit_conn (recvmsg, recvmmsg, sendmsg, sendmmsg) had their instruction count increase beyond the 4.14 limit. This was my solution to getting the BPF instruction count on 4.14 kernels to work. The rationale is that cases that are right on the edge of this limit are likely to have problems already.

The other things I considered were the following:

Remove match_trace_tgid logic on 4.14 kernels -- did not lower the instruction count enough

Make the iovec ret syscalls provide a NULL sock for its submit_new_conn call (e.g. submit_new_conn(ctx, tgid, fd, addr, /*sock*/ NULL, role, source_fn)) -- not viable since it partially handles mid stream connections

I'm open to other suggestions if you have any, but this was the set of things I came up with.

src/stirling/source_connectors/socket_tracer/socket_trace_bpf_test.cc

Signed-off-by: Dom Del Nano <[email protected]>

…t for 4.14 kernels Signed-off-by: Dom Del Nano <[email protected]>

oazizi000 · 2024-09-03T17:04:30Z

Approach looks good. We should just manually test on a few more endpoints to gain more confidence. If that all looks good, then this should be good to go.

…local port (skc_num field) Signed-off-by: Dom Del Nano <[email protected]>

Signed-off-by: Dom Del Nano <[email protected]>

ddelnano · 2024-09-04T14:53:13Z

@oazizi000 I've tested this change on EKS (Amazon Linux 2), GKE Ubuntu and GKE COS in addition to tracking down the endianness problem (kernel data structure stores the local port in host byte order).

I've also created #2002 to make that byte order more clear even though that case doesn't have a bug.

…#2002) Summary: Use correct byte ordering function for kernel struct member (skc_num) This change doesn't result in a functional difference since `ntohs` and `htons` are inverses of each other on little endian machines (and noops for big endian machines). This field's byte order caused me confusion in #1989, so I wanted to make this struct access consistent. Relevant Issues: N/A Type of change: /kind cleanup Test Plan: Testing during #1989 (details [here](#1989 (comment))) Signed-off-by: Dom Del Nano <[email protected]>

…1989) Summary: Populate client side trace's local address via tcp kprobes This change populates client side trace's `local_addr` and `local_port` columns for the following use cases: 1. To provide more consistency for the protocol data tables. Having columns that are empty make it difficult for end users to understand what is being traced and make them less useful 2. To facilitate addressing a portion of the short lived process problems (pixie-io#1638) For 2, the root of the issue is that `df.ctx["pod"]` syntax relies on the [px.upid_to_pod_name](https://docs.px.dev/reference/pxl/udf/upid_to_pod_name/) function. If a PEM misses the short lived process during its metadata update, this function fails to resolve the pod name. For client side traces where the pod is making an outbound connection (non localhost), the `local_addr` column provides an alternative pod name lookup for short lived processes when the pod is long lived. This means the following would be equivalent to the `df.ctx["pod"]` lookup: `px.pod_id_to_pod_name(px.ip_to_pod_id(df.local_addr))`. I intend to follow this PR with a compiler change that will make `df.ctx["pod"]` try both methods should `px.upid_to_pod_name` fail to resolve. This will allow the existing pxl scripts to display the previously missed short lived processes. **Alternatives** Another approach I considered was expanding our use of the `sock_alloc` kprobe. I used ftrace on a simple curl command to see what other options could be used (`sudo trace-cmd record -F -p function_graph http://google.com`). The `socket` syscall calls `sock_alloc`, which would be another mechanism for accessing the `struct sock`. I decided against this approach because I don't think its viable to assume that the same thread/process that calls `socket` will be the one that does the later syscalls (how our BPF maps are set up). It's common to have a forking web server model, which means a different process/thread can call `socket` than the ones that later read/write to it. **Probe stability** These probes appear to be stable from our oldest and newest supported kernel. These functions exist in the [tcp_prot](https://elixir.bootlin.com/linux/v4.14.336/source/net/ipv4/tcp_ipv4.c#L2422), [tcpv6_prot](https://elixir.bootlin.com/linux/v4.14.336/source/net/ipv6/tcp_ipv6.c#L1941) structs and I've seen that other projects and bcc tools use these probes. This makes me believe that these functions have a pretty well defined interface. Relevant Issues: pixie-io#1829, pixie-io#1638 Type of change: /kind feature Test Plan: New tests verify that ipv4 and ipv6 cases work - [x] Ran `for i in $(seq 0 1000); do curl http://google.com/$i; sleep 2; done` within a pod and verified that `local_addr` is populated with this change and `px.pod_id_to_pod_name(px.ip_to_pod_id(df.local_addr))` works for pod name resolution. - [x] Verified the above curl test results in traces without `local_addr` without this change ![local-addr-testing](https://github.com/user-attachments/assets/344be022-97a0-4096-8af7-8de20d741e40) - Tested on the following k8s offerings and machine images - [x] GKE COS and Ubuntu - [x] EKS Amazon Linux 2 Changelog Message: Populate socket tracer data table `local_addr` and `local_port` column for client side traces. --------- Signed-off-by: Dom Del Nano <[email protected]> GitOrigin-RevId: 0c1fdd2

ddelnano requested a review from a team as a code owner August 23, 2024 18:37

ddelnano commented Aug 23, 2024

View reviewed changes

src/stirling/source_connectors/socket_tracer/bcc_bpf/socket_trace.c Outdated Show resolved Hide resolved

ddelnano changed the title ~~Populate client side traces local address via tcp kprobes~~ Populate client side trace's local address via tcp kprobes Aug 23, 2024

ddelnano commented Aug 23, 2024

View reviewed changes

ddelnano requested a review from oazizi000 August 26, 2024 16:13

ddelnano commented Aug 27, 2024

View reviewed changes

src/stirling/source_connectors/socket_tracer/socket_trace_bpf_test.cc Outdated Show resolved Hide resolved

ddelnano force-pushed the ddelnano/populate-laddr-for-socket-tracer-records branch from 9e10095 to c219f42 Compare August 29, 2024 17:58

ddelnano added 6 commits August 29, 2024 17:59

Populate client side traces local address via tcp connect kprobes

8e966f8

Signed-off-by: Dom Del Nano <[email protected]>

Fix linting issues

3bd7013

Signed-off-by: Dom Del Nano <[email protected]>

Remove unrelated file

a01138e

Signed-off-by: Dom Del Nano <[email protected]>

Add probes on tcp_sendmsg to handle mid stream connections

f3548ef

Signed-off-by: Dom Del Nano <[email protected]>

#ci:bpf-build-all-kernels

7a5cd84

Signed-off-by: Dom Del Nano <[email protected]>

Reduce iovec bpf loop limit from 42 to 41 to fix BPF instruction coun…

7960135

…t for 4.14 kernels Signed-off-by: Dom Del Nano <[email protected]>

ddelnano force-pushed the ddelnano/populate-laddr-for-socket-tracer-records branch from c219f42 to 7960135 Compare August 29, 2024 17:59

Properly convert kernel data structure that uses host byte order for …

636d15f

…local port (skc_num field) Signed-off-by: Dom Del Nano <[email protected]>

ddelnano mentioned this pull request Sep 4, 2024

Use correct byte ordering function for kernel struct member (skc_num) #2002

Merged

Ensure ipv6 test is cleaned up after endianess fix was discovered

945b211

Signed-off-by: Dom Del Nano <[email protected]>

oazizi000 approved these changes Sep 5, 2024

View reviewed changes

ddelnano merged commit 0c1fdd2 into pixie-io:main Sep 5, 2024
44 checks passed

ddelnano deleted the ddelnano/populate-laddr-for-socket-tracer-records branch September 5, 2024 16:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Populate client side trace's local address via tcp kprobes #1989

Populate client side trace's local address via tcp kprobes #1989

ddelnano commented Aug 23, 2024 •

edited

Loading

ddelnano Aug 23, 2024

ddelnano commented Aug 26, 2024

ddelnano Aug 27, 2024

oazizi000 commented Sep 3, 2024

ddelnano commented Sep 4, 2024

Populate client side trace's local address via tcp kprobes #1989

Populate client side trace's local address via tcp kprobes #1989

Conversation

ddelnano commented Aug 23, 2024 • edited Loading

ddelnano Aug 23, 2024

Choose a reason for hiding this comment

ddelnano commented Aug 26, 2024

ddelnano Aug 27, 2024

Choose a reason for hiding this comment

oazizi000 commented Sep 3, 2024

ddelnano commented Sep 4, 2024

ddelnano commented Aug 23, 2024 •

edited

Loading