-
Notifications
You must be signed in to change notification settings - Fork 428
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Populate client side trace's local address via tcp kprobes #1989
Populate client side trace's local address via tcp kprobes #1989
Conversation
src/stirling/source_connectors/socket_tracer/bcc_bpf/socket_trace.c
Outdated
Show resolved
Hide resolved
{"tcp_sendmsg", ProbeType::kEntry, "probe_entry_populate_active_connect_sock", | ||
/*is_syscall*/ false}, | ||
{"tcp_sendmsg", ProbeType::kReturn, "probe_ret_populate_active_connect_sock", | ||
/*is_syscall*/ false}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we don't have an easy mechanism for testing this, I tried to test this mid stream case by creating a psql shell prior to the PEM starting and issuing queries after the socket tracer was initialized. I verified that a tcp connection was made and only sendto
syscalls were issued when queries were executed (no connect
syscalls) to check that this would simulate a mid stream case.
Surprisingly, I saw that the local_addr
column was populated before I added the probe on tcp_sendmsg
despite my thinking that it should be empty. I went ahead and added this probe since I verified with ftrace
on the same psql process that tcp_v4_connect
isn't called. Just raising this since I wasn't able to verify that this additional probe adds extra coverage from the testing I did.
@oazizi000 would appreciate if you could give this a review. I still need to fix the bpf instruction issue for the 4.x kernels and the incorrect storage of the local port (it is stored in network byte order and is an existing issue), but that shouldn't impact the core details in the PR. |
@@ -176,7 +176,7 @@ DEFINE_bool( | |||
stirling_debug_tls_sources, gflags::BoolFromEnv("PX_DEBUG_TLS_SOURCES", false), | |||
"If true, stirling will add additional prometheus metrics regarding the traced tls sources"); | |||
|
|||
DEFINE_uint32(stirling_bpf_loop_limit, 42, | |||
DEFINE_uint32(stirling_bpf_loop_limit, 41, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The iovec syscall return probes that rely on process_implicit_conn (recvmsg, recvmmsg, sendmsg, sendmmsg) had their instruction count increase beyond the 4.14 limit. This was my solution to getting the BPF instruction count on 4.14 kernels to work. The rationale is that cases that are right on the edge of this limit are likely to have problems already.
The other things I considered were the following:
- Remove match_trace_tgid logic on 4.14 kernels -- did not lower the instruction count enough
- Make the iovec ret syscalls provide a NULL sock for its
submit_new_conn
call (e.g.submit_new_conn(ctx, tgid, fd, addr, /*sock*/ NULL, role, source_fn)
) -- not viable since it partially handles mid stream connections
I'm open to other suggestions if you have any, but this was the set of things I came up with.
src/stirling/source_connectors/socket_tracer/socket_trace_bpf_test.cc
Outdated
Show resolved
Hide resolved
9e10095
to
c219f42
Compare
Signed-off-by: Dom Del Nano <[email protected]>
Signed-off-by: Dom Del Nano <[email protected]>
Signed-off-by: Dom Del Nano <[email protected]>
Signed-off-by: Dom Del Nano <[email protected]>
Signed-off-by: Dom Del Nano <[email protected]>
…t for 4.14 kernels Signed-off-by: Dom Del Nano <[email protected]>
c219f42
to
7960135
Compare
Approach looks good. We should just manually test on a few more endpoints to gain more confidence. If that all looks good, then this should be good to go. |
…local port (skc_num field) Signed-off-by: Dom Del Nano <[email protected]>
Signed-off-by: Dom Del Nano <[email protected]>
@oazizi000 I've tested this change on EKS (Amazon Linux 2), GKE Ubuntu and GKE COS in addition to tracking down the endianness problem (kernel data structure stores the local port in host byte order). I've also created #2002 to make that byte order more clear even though that case doesn't have a bug. |
…#2002) Summary: Use correct byte ordering function for kernel struct member (skc_num) This change doesn't result in a functional difference since `ntohs` and `htons` are inverses of each other on little endian machines (and noops for big endian machines). This field's byte order caused me confusion in #1989, so I wanted to make this struct access consistent. Relevant Issues: N/A Type of change: /kind cleanup Test Plan: Testing during #1989 (details [here](#1989 (comment))) Signed-off-by: Dom Del Nano <[email protected]>
…1989) Summary: Populate client side trace's local address via tcp kprobes This change populates client side trace's `local_addr` and `local_port` columns for the following use cases: 1. To provide more consistency for the protocol data tables. Having columns that are empty make it difficult for end users to understand what is being traced and make them less useful 2. To facilitate addressing a portion of the short lived process problems (pixie-io#1638) For 2, the root of the issue is that `df.ctx["pod"]` syntax relies on the [px.upid_to_pod_name](https://docs.px.dev/reference/pxl/udf/upid_to_pod_name/) function. If a PEM misses the short lived process during its metadata update, this function fails to resolve the pod name. For client side traces where the pod is making an outbound connection (non localhost), the `local_addr` column provides an alternative pod name lookup for short lived processes when the pod is long lived. This means the following would be equivalent to the `df.ctx["pod"]` lookup: `px.pod_id_to_pod_name(px.ip_to_pod_id(df.local_addr))`. I intend to follow this PR with a compiler change that will make `df.ctx["pod"]` try both methods should `px.upid_to_pod_name` fail to resolve. This will allow the existing pxl scripts to display the previously missed short lived processes. **Alternatives** Another approach I considered was expanding our use of the `sock_alloc` kprobe. I used ftrace on a simple curl command to see what other options could be used (`sudo trace-cmd record -F -p function_graph http://google.com`). The `socket` syscall calls `sock_alloc`, which would be another mechanism for accessing the `struct sock`. I decided against this approach because I don't think its viable to assume that the same thread/process that calls `socket` will be the one that does the later syscalls (how our BPF maps are set up). It's common to have a forking web server model, which means a different process/thread can call `socket` than the ones that later read/write to it. **Probe stability** These probes appear to be stable from our oldest and newest supported kernel. These functions exist in the [tcp_prot](https://elixir.bootlin.com/linux/v4.14.336/source/net/ipv4/tcp_ipv4.c#L2422), [tcpv6_prot](https://elixir.bootlin.com/linux/v4.14.336/source/net/ipv6/tcp_ipv6.c#L1941) structs and I've seen that other projects and bcc tools use these probes. This makes me believe that these functions have a pretty well defined interface. Relevant Issues: pixie-io#1829, pixie-io#1638 Type of change: /kind feature Test Plan: New tests verify that ipv4 and ipv6 cases work - [x] Ran `for i in $(seq 0 1000); do curl http://google.com/$i; sleep 2; done` within a pod and verified that `local_addr` is populated with this change and `px.pod_id_to_pod_name(px.ip_to_pod_id(df.local_addr))` works for pod name resolution. - [x] Verified the above curl test results in traces without `local_addr` without this change ![local-addr-testing](https://github.com/user-attachments/assets/344be022-97a0-4096-8af7-8de20d741e40) - Tested on the following k8s offerings and machine images - [x] GKE COS and Ubuntu - [x] EKS Amazon Linux 2 Changelog Message: Populate socket tracer data table `local_addr` and `local_port` column for client side traces. --------- Signed-off-by: Dom Del Nano <[email protected]> GitOrigin-RevId: 0c1fdd2
Summary: Populate client side trace's local address via tcp kprobes
This change populates client side trace's
local_addr
andlocal_port
columns for the following use cases:For 2, the root of the issue is that
df.ctx["pod"]
syntax relies on the px.upid_to_pod_name function. If a PEM misses the short lived process during its metadata update, this function fails to resolve the pod name. For client side traces where the pod is making an outbound connection (non localhost), thelocal_addr
column provides an alternative pod name lookup for short lived processes when the pod is long lived. This means the following would be equivalent to thedf.ctx["pod"]
lookup:px.pod_id_to_pod_name(px.ip_to_pod_id(df.local_addr))
.I intend to follow this PR with a compiler change that will make
df.ctx["pod"]
try both methods shouldpx.upid_to_pod_name
fail to resolve. This will allow the existing pxl scripts to display the previously missed short lived processes.Alternatives
Another approach I considered was expanding our use of the
sock_alloc
kprobe. I used ftrace on a simple curl command to see what other options could be used (sudo trace-cmd record -F -p function_graph http://google.com
). Thesocket
syscall callssock_alloc
, which would be another mechanism for accessing thestruct sock
. I decided against this approach because I don't think its viable to assume that the same thread/process that callssocket
will be the one that does the later syscalls (how our BPF maps are set up). It's common to have a forking web server model, which means a different process/thread can callsocket
than the ones that later read/write to it.Probe stability
These probes appear to be stable from our oldest and newest supported kernel. These functions exist in the tcp_prot, tcpv6_prot structs and I've seen that other projects and bcc tools use these probes. This makes me believe that these functions have a pretty well defined interface.
Relevant Issues: #1829, #1638
Type of change: /kind feature
Test Plan: New tests verify that ipv4 and ipv6 cases work
Ran
for i in $(seq 0 1000); do curl http://google.com/$i; sleep 2; done
within a pod and verified thatlocal_addr
is populated with this change andpx.pod_id_to_pod_name(px.ip_to_pod_id(df.local_addr))
works for pod name resolution.Verified the above curl test results in traces without
local_addr
without this changeTested on the following k8s offerings and machine images
GKE COS and Ubuntu
EKS Amazon Linux 2
Changelog Message: Populate socket tracer data table
local_addr
andlocal_port
column for client side traces.