Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse kthreads during /proc scanning #2089

Merged
merged 3 commits into from
Feb 20, 2024

Conversation

tpapagian
Copy link
Member

@tpapagian tpapagian commented Feb 12, 2024

In the case where a user process is started by a kernel thread, it will miss its parent from the execve_map. This makes process handling in the processLRU and the reference counting of that a bit problematic.

This PR fixes that by adding kernel threads in the execve_map and processLRU during /proc scanning. We will not generate process_exec and process_exit events for kernel threads, but we keep our internal data structures up-to-date.

@tpapagian tpapagian added the release-note/minor This PR introduces a minor user-visible change label Feb 12, 2024
@tpapagian tpapagian force-pushed the pr/apapag/add-kthreads-during-proc-scan branch from f59a334 to b2d56dd Compare February 12, 2024 16:10
@tpapagian tpapagian marked this pull request as ready for review February 12, 2024 16:54
@tpapagian tpapagian requested a review from a team as a code owner February 12, 2024 16:54
)

// This is used only when we add kernel threads during /proc scanning.
// We use the same interface with all the other events to make the handling
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we also add kernel threads from the clone right? your PR does that? I'm just thinking on how to improve that comment so it reads we handle /proc scanning here and clone part of the usuall clone.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we also add kernel threads from the clone right? your PR does that?

Yes, the next commit reverts a commit that didn't do that.

This is only used for kernel threads during proc scanning. This should not be used during normal operation. The reason that I added that is that this is somewhere in the middle compared to what the exec and clone handler do.

In that case, we don't need to generate events compared to exec. In comparison to clones, we do not have a way to provide a process name as we always inherit that from the parent.

But this is also much more simple. There is no need to do anything related to the eventcache. I preferred to create a new (simple) handler compared to changing the existing handlers (and making them more complex).

contrib/tester-progs/.gitignore Show resolved Hide resolved
m.Unix.Process.Filename = filename
m.Unix.Process.Args = args
if p.kernel_thread {
m := exec.MsgKThreadInitUnix{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why can't we use MsgExecveEventUnix for kernel threads as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main reasons that I didn't use the MsgExecveEventUnix are:

  1. I didn't want to have process_exec events for kernel threads (similar to what we do now)
  2. We didn't want to put any of these events in the eventcache as if we cannot find its parent at that time we will never find that. I have added a log.Fatalf in that case, but maybe it is a good idea to just print a warning instead.

I could have modified MsgExecveEventUnix to handle these, but I believe that it is a cleaner approach to have a separate message type.

@@ -500,13 +578,13 @@ func listRunningProcs(procPath string) ([]procs, error) {

execPath, err := os.Readlink(filepath.Join(procPath, d.Name(), "exe"))
if err != nil {
logger.GetLogger().WithError(err).WithField("process", d.Name()).Warnf("reading process exe error")
execPath = strings.TrimSuffix(string(cmdline), "\n")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, how this became non error case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case of kernel threads, /proc/<pid>/exe does not make sense. So in that case, I get the execPath from the kthread name. But still, for the non-kernel threads, we should report this error. I will update it to something like:

if kernelThread {
    execPath = strings.TrimSuffix(string(cmdline), "\n")
} else {
    logger.GetLogger().WithError(err).WithField("process", d.Name()).Warnf("reading process exe error")
}

@tpapagian tpapagian force-pushed the pr/apapag/add-kthreads-during-proc-scan branch from b2d56dd to 853da79 Compare February 15, 2024 12:38
Copy link

netlify bot commented Feb 15, 2024

Deploy Preview for tetragon ready!

Name Link
🔨 Latest commit aab532a
🔍 Latest deploy log https://app.netlify.com/sites/tetragon/deploys/65d30d7da02f0d00081a754f
😎 Deploy Preview https://deploy-preview-2089--tetragon.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@@ -408,6 +463,20 @@ func listRunningProcs(procPath string) ([]procs, error) {
if err != nil {
logger.GetLogger().WithError(err).Warnf("Reading Loginuid of %s failed, falling back to loginuid: %d", pathName, uint32(auid))
}

// Check and add only thread leader processes.
if len(status.NSpid) > 0 && len(status.NStgid) > 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is unrelated change, right? I think it should be in separated commit or at least mentioned in the changelog

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I will do that in a separate commit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't get this change! IIRC scanning /proc/ only shows thread leaders, it should not show threads or maybe I'm getting old and making stuff...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that you are correct. I have checked the logs from running this several days with many restarts and no similar log messages were found.

I initially did that as a sanity check but it seems that we can remove that. Thanks!

@tpapagian tpapagian force-pushed the pr/apapag/add-kthreads-during-proc-scan branch from 853da79 to aab532a Compare February 19, 2024 08:12
Copy link
Member

@tixxdz tixxdz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some stuff to fix, about the fake pid 0 that we make in case we have things triggered from interrupt context... hmm maybe just make it a kthread too? up to you.

m.Unix.Process.TID = p.pid
m.Unix.Process.NSPID = p.nspid
m.Unix.Process.UID = 0
m.Unix.Process.AUID = 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe make this InvalidUID 4294967295 https://github.com/cilium/tetragon/blob/main/pkg/reader/proc/proc.go#L50 as kernel threads won't have a valid auid

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

// All processes have a directory name that consists from a number.
if !regexp.MustCompile(`\d`).MatchString(d.Name()) {
continue
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great addition ;-)


// We read comm in the case where cmdling is empty (i.e. kernel thread).
comm, err := os.ReadFile(filepath.Join(pathName, "comm"))
if err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this only to check if process is a kernel thread? good change ;-) , I thought that we can check /proc/$pid/status for kthread field, but it seems available only on new kernels

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We check that this is a kernel thread by using string(cmdline) == "" in a similar way that we did that before. I use comm just to get a name for the kernel thread in order to provide a reasonable name for the binary field.

@@ -408,6 +463,20 @@ func listRunningProcs(procPath string) ([]procs, error) {
if err != nil {
logger.GetLogger().WithError(err).Warnf("Reading Loginuid of %s failed, falling back to loginuid: %d", pathName, uint32(auid))
}

// Check and add only thread leader processes.
if len(status.NSpid) > 0 && len(status.NStgid) > 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't get this change! IIRC scanning /proc/ only shows thread leaders, it should not show threads or maybe I'm getting old and making stuff...

In the case where a user process is started by a kernel threads, it will
miss it's parent from the execve_map. This makes process handling in the
processLRU and the reference counting of that a bit problematic.

This patch fixes that by adding kernel threads in the execve_map and
processLRU during /proc scanning. We will not generate process_exec and
process_exit events for kernel threads, but we keep our internal data
structures up-to-date.

Signed-off-by: Anastasios Papagiannis <[email protected]>
This reverts commit 63c854f.

The previous commit fixes an issue where user processes that start from
a kernel thread miss parent info.

This patch reverts a commit that avoids sending clone events to the user.
We still do not generate any events for these, but it allows us to have
our internal data structures (i.e. execve_map and processLRU up-to-date).

Signed-off-by: Anastasios Papagiannis <[email protected]>
Signed-off-by: Anastasios Papagiannis <[email protected]>
@tpapagian tpapagian force-pushed the pr/apapag/add-kthreads-during-proc-scan branch from aab532a to 0ce1177 Compare February 20, 2024 07:10
@tpapagian tpapagian merged commit 18ecc57 into main Feb 20, 2024
36 checks passed
@tpapagian tpapagian deleted the pr/apapag/add-kthreads-during-proc-scan branch February 20, 2024 09:50
@tpapagian
Copy link
Member Author

Some stuff to fix, about the fake pid 0 that we make in case we have things triggered from interrupt context... hmm maybe just make it a kthread too? up to you.

I believe that we do that here. This PR does not touch this part. We handle this as an exec message as we also print that in the json logs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-note/minor This PR introduces a minor user-visible change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants