Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/linux: use NETLINK_SOCK_DIAG #1660

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

florianl
Copy link

@florianl florianl commented Jun 5, 2024

To retrieve network information use NETLINK_SOCK_DIAG instead of walking /proc/<PID> and parsing files.
Consequently this reduces the number of syscalls, as walking /proc/<PID> and opening, reading and closing files is no longer required. But it also reduces the memory footprint as reading files into memory for processing is no longer required.

Related issues:

Supersedes #809

@@ -374,15 +373,15 @@ type inodeMap struct {
}

type connTmp struct {
Copy link
Author

@florianl florianl Jun 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reorder struct elements for better memory alignment/usage.

@florianl
Copy link
Author

florianl commented Jun 8, 2024

To quantify this change is hard, as it is hard to have a controlled environment, where network connections do not open/close all the time. For this reason, I think, the GetConnections benchmark got removed with 5cd488f.
But it is also important to get a feeling for the impact of this proposed change. In a ec2 instance, I did run the following program compiled once on shirou/[email protected] and compared it with this proposed change.

Program to test impact of NETLINK_SOCK_DIAG
package main

import (
	"context"
	"time"

	"github.com/shirou/gopsutil/v4/net"
)

func main() {
	ctx, cancel := context.WithTimeout(context.Background(), 3*time.Minute)
	defer cancel()

	for {
		select {
		case <-ctx.Done():
			return
		default:
			conns, err := net.Connections("all")
			if err != nil {
				panic(err)
			}
			for _, c := range conns {
				_ = c
			}
		}
	}
}

In the first picture the program is using shirou/[email protected]. Here one can see the impact of the repeated call of the read syscall.
20240608-135739-orig

The following picture shows the impact of this proposed change. There are less read syscalls and instead the newly introduced netlink syscall to the NETLINK_SOCK_DIAG is noticeable.
20240608-135721-diag

Both pictures are generated with continuous on-CPU profiling. So consequently, what is not shown in these pictures is the off-CPU part. As the original implementation is reading and processing files to extract network connection, the waiting on i/o is not visualized in both pictures. As the proposed change is less i/o intensive, garbage collection is also reduced with the proposed change.

@shirou I would be interested on your feedback.

To retrieve network information use NETLINK_SOCK_DIAG instead of walking
/proc/<PID> and parsing files.
Consequently this reduces the number of syscalls, as walking /proc/<PID> and
opening, reading and closing files is no longer required. But it also reduces
the memory footprint as reading files into memory for processing is no longer
required.

Related issues:
- shirou#695
- shirou#784

Supersedes shirou#809

Signed-off-by: Florian Lehner <[email protected]>
Signed-off-by: Florian Lehner <[email protected]>
Signed-off-by: Florian Lehner <[email protected]>
@shirou
Copy link
Owner

shirou commented Jun 8, 2024

Thank you very much for the excellent PR. Netlink support has been a concern for over 5 years or more. However, when we tried it in #809, the issue seemed to be with obtaining the PID and FD, rather than the change to netlink. But it was so long ago that I don't remember very well... And perhaps various things have changed with the recent versions of Go.

Therefore, I have a few requests:

  1. Could you please provide benchmarks? I think this comment will be helpful.
  2. You are adding your library https://github.com/florianl/go-diag as a new import, but it seems it has many dependencies. Wouldn't https://github.com/elastic/gosigar/ or https://github.com/vishvananda/netlink/ have fewer dependencies for this PR purpose?

@florianl
Copy link
Author

florianl commented Jun 10, 2024

Could you please provide benchmarks?

Benchmark code
package net

import "testing"

func BenchmarkGetConnectionsInet(b *testing.B) {
	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		Connections("inet")
	}
	b.ReportAllocs()
}

func BenchmarkGetConnectionsAll(b *testing.B) {
	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		Connections("all")
	}
	b.ReportAllocs()
}
old.txt
goos: linux
goarch: amd64
pkg: github.com/shirou/gopsutil/v4/net
cpu: AMD Ryzen 7 PRO 4750U with Radeon Graphics
BenchmarkGetConnectionsInet-16    	      43	  25733254 ns/op	 1937448 B/op	   32789 allocs/op
BenchmarkGetConnectionsInet-16    	      50	  24458512 ns/op	 1938092 B/op	   32777 allocs/op
BenchmarkGetConnectionsInet-16    	      64	  26101562 ns/op	 1937877 B/op	   32777 allocs/op
BenchmarkGetConnectionsInet-16    	      46	  24488909 ns/op	 1926458 B/op	   32573 allocs/op
BenchmarkGetConnectionsInet-16    	      46	  23428249 ns/op	 1915767 B/op	   32410 allocs/op
BenchmarkGetConnectionsInet-16    	      55	  23917699 ns/op	 1904857 B/op	   32310 allocs/op
BenchmarkGetConnectionsInet-16    	      99	  19958473 ns/op	 1897309 B/op	   32240 allocs/op
BenchmarkGetConnectionsInet-16    	      55	  23316393 ns/op	 1894094 B/op	   32213 allocs/op
BenchmarkGetConnectionsInet-16    	      99	  20849225 ns/op	 1894654 B/op	   32215 allocs/op
BenchmarkGetConnectionsInet-16    	      61	  22280353 ns/op	 1894373 B/op	   32215 allocs/op
BenchmarkGetConnectionsAll-16     	      86	  24582390 ns/op	 2867406 B/op	   37833 allocs/op
BenchmarkGetConnectionsAll-16     	      48	  26337763 ns/op	 2817073 B/op	   36948 allocs/op
BenchmarkGetConnectionsAll-16     	      44	  27345821 ns/op	 2815822 B/op	   36911 allocs/op
BenchmarkGetConnectionsAll-16     	      42	  26100068 ns/op	 2814700 B/op	   36904 allocs/op
BenchmarkGetConnectionsAll-16     	      54	  27024142 ns/op	 2815417 B/op	   36908 allocs/op
BenchmarkGetConnectionsAll-16     	      48	  27550235 ns/op	 2804678 B/op	   36809 allocs/op
BenchmarkGetConnectionsAll-16     	      45	  25555057 ns/op	 2806079 B/op	   36842 allocs/op
BenchmarkGetConnectionsAll-16     	      38	  26605894 ns/op	 2807474 B/op	   36873 allocs/op
BenchmarkGetConnectionsAll-16     	      46	  27842517 ns/op	 2789729 B/op	   36775 allocs/op
BenchmarkGetConnectionsAll-16     	      93	  26043600 ns/op	 2790893 B/op	   36808 allocs/op
PASS
ok  	github.com/shirou/gopsutil/v4/net	39.989s
new.txt
goos: linux
goarch: amd64
pkg: github.com/shirou/gopsutil/v4/net
cpu: AMD Ryzen 7 PRO 4750U with Radeon Graphics
BenchmarkGetConnectionsInet-16    	      26	  52019407 ns/op	 1891888 B/op	   32776 allocs/op
BenchmarkGetConnectionsInet-16    	      32	  49179081 ns/op	 1889061 B/op	   32729 allocs/op
BenchmarkGetConnectionsInet-16    	      21	  48633870 ns/op	 1888458 B/op	   32726 allocs/op
BenchmarkGetConnectionsInet-16    	      25	  48406743 ns/op	 1889749 B/op	   32732 allocs/op
BenchmarkGetConnectionsInet-16    	      25	  47062656 ns/op	 1878632 B/op	   32419 allocs/op
BenchmarkGetConnectionsInet-16    	      26	  41867123 ns/op	 1872315 B/op	   32325 allocs/op
BenchmarkGetConnectionsInet-16    	      54	  37856214 ns/op	 1871989 B/op	   32326 allocs/op
BenchmarkGetConnectionsInet-16    	      55	  38055952 ns/op	 1872502 B/op	   32326 allocs/op
BenchmarkGetConnectionsInet-16    	      30	  39324826 ns/op	 1870846 B/op	   32323 allocs/op
BenchmarkGetConnectionsInet-16    	      31	  40221279 ns/op	 1872418 B/op	   32325 allocs/op
BenchmarkGetConnectionsAll-16     	      45	  49787373 ns/op	 3337189 B/op	   51523 allocs/op
BenchmarkGetConnectionsAll-16     	      24	  48060186 ns/op	 3338319 B/op	   51520 allocs/op
BenchmarkGetConnectionsAll-16     	      40	  45551023 ns/op	 3337736 B/op	   51527 allocs/op
BenchmarkGetConnectionsAll-16     	      24	  49764324 ns/op	 3338253 B/op	   51525 allocs/op
BenchmarkGetConnectionsAll-16     	      20	  52223118 ns/op	 3340811 B/op	   51565 allocs/op
BenchmarkGetConnectionsAll-16     	      25	  50223752 ns/op	 3359101 B/op	   51738 allocs/op
BenchmarkGetConnectionsAll-16     	      24	  45332779 ns/op	 3358377 B/op	   51741 allocs/op
BenchmarkGetConnectionsAll-16     	      37	  48200678 ns/op	 3368081 B/op	   51836 allocs/op
BenchmarkGetConnectionsAll-16     	      24	  44535266 ns/op	 3348248 B/op	   51642 allocs/op
BenchmarkGetConnectionsAll-16     	      24	  49209950 ns/op	 3336002 B/op	   51505 allocs/op
PASS
ok  	github.com/shirou/gopsutil/v4/net	34.226s

Note

The number of network connections and processes changed during the benchmark. Therefore, the results are not reliable. The benchmark does also not reflect the positive impact of the proposed change on the garbage collector, when running in a real world application or the reduced number of syscalls.

10 iterations of the benchmark with current master did take 39.989s while 10 iterations of the benchmark of the proposed change took 34.226s.

benchstat old.txt new.txt
name                   old time/op    new time/op    delta
GetConnectionsInet-16    23.5ms ±15%    44.3ms ±18%  +88.73%  (p=0.000 n=10+10)
GetConnectionsAll-16     26.5ms ± 7%    48.3ms ± 8%  +82.23%  (p=0.000 n=10+10)

name                   old alloc/op   new alloc/op   delta
GetConnectionsInet-16    1.91MB ± 1%    1.88MB ± 1%   -1.79%  (p=0.000 n=10+10)
GetConnectionsAll-16     2.81MB ± 1%    3.35MB ± 1%  +19.21%  (p=0.000 n=9+10)

name                   old allocs/op  new allocs/op  delta
GetConnectionsInet-16     32.5k ± 1%     32.5k ± 1%     ~     (p=0.516 n=10+10)
GetConnectionsAll-16      36.9k ± 0%     51.6k ± 0%  +40.01%  (p=0.000 n=9+10)

As mentioned in #1660 (comment) providing a reliable benchmark for this change is hard.

You are adding your library https://github.com/florianl/go-diag as a new import, but it seems it has many dependencies. Wouldn't https://github.com/elastic/gosigar/ or https://github.com/vishvananda/netlink/ have fewer dependencies for this PR purpose?

How does the number of dependencies impact performance?
florianl/go-diag focuses on a proper implementation of NETLINK_SOCK_DIAG. To me, it looks like neither elastic/go-sigar nor vishvananda/netlink provide the required NETLINK_SOCK_DIAG implementation - both packages try to cover some parts of the netlink API, but looking at their issue trackers there is place for improvement.
I did not implement NETLINK_SOCK_DIAG within shirou/gopsutil , as I'm also interested in using NETLINK_SOCK_DIAG in a different project and adding the complete NETLINK_SOCK_DIAG API didn't sound like the correct approach to me, as there is no non Linux equivalent.

[..] the issue seemed to be with obtaining the PID and FD

As can be seen in the flamegraphs in #1660 (comment) walking /proc to fetch PID/FD information is unchanged.

On systems processes and network connections start and stop all the time. Using NETLINK_SOCK_DIAG avoids the race conditions, that happen while walking /proc to fetch inet, inet6 or unix information by getting a direct snapshot from the kernel at once.

@shirou
Copy link
Owner

shirou commented Jun 15, 2024

Thank you for the benchmark. However, from this, it appears that the time increased by 88%, memory usage for Inet slightly decreased but increased for All, and allocations increased by 40%. This suggests that performance seems to have worsened. I understand that accurate measurement is difficult, but is this result accurate?

How does the number of dependencies impact performance?

Increasing unnecessary dependencies will increase the final build time and binary size. Moreover, gopsutil is a library, meaning it is used by various other applications. If gopsutil depends on many other libraries, it makes troubleshooting more difficult for application developers. Therefore, I would like to minimize the number of dependencies as much as possible.

@florianl
Copy link
Author

florianl commented Jul 19, 2024

I understand that accurate measurement is difficult, but is this result accurate?

No - these numbers are totally off, not reliable and not reproducible.

I have mentioned this already in the Note in #1660 (comment).

Note

The number of network connections and processes changed during the benchmark. Therefore, the results are not reliable. The benchmark does also not reflect the positive impact of the proposed change on the garbage collector, when running in a real world application or the reduced number of syscalls.

If using this PR with elastic-agent, the CPU consumption is reduced by 10% and heap size is also reduced by 8 to 10%. @shirou

Data accuracy is another reason for #1660. With netlink one gets all network connections at once. While with the current approach the returned resuts can contain network connections that no longer exists or even miss network connections.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants