Skip to content

Commit

Permalink
bpf: remove cgrpid_tracker
Browse files Browse the repository at this point in the history
There is an issue with the cgroup tracker id (details can be found
below). This issue causes policyfilter to misbehave. Furthermore, the
original intention behind the cgroup tracker id was never realized
and so it currently serves no purpose. Its only user is the
policyfilter.

This PR removes the cgroup tracker id, and, instead, computes the cgroup
id at the time of the policyfilter check. For most common setups
(cgroupv2) the overhead of doing so is negligible. Computing the cgroup
id at the time of check is more reliable and also simplifies the code.

More details can be found below.

We have a namespaced policy installed that mathes connect() calls to 8.8.8.8, and execute the
following on a pod in the same namespace:

`kubectl exec -it nonqos-demo -- curl 8.8.8.8`

We observe two things:
 * there is no event from the policy (as it should be)
 * the cgroup tracker id of the curl process is the same as the parent runc process (which should
   not be the case)

The exec event we get from curl has:
 .process.pid: 244958
 .process.binary: /usr/bin/curl
 .parent.pid: 244948
 .parent.binary: /usr/bin/runc

Looking at the cgroup tracker id in the execve map for these two pids we see:
"cgrpid_tracker": 2371
"cgrpid_tracker": 2371

Which means that the cgroup id tracker is not correct, and the curl
process will not be matched by the policy.

The issue might arise from the fact that in the wake_up_new_task(),
computing the cgroup tracker id, uses get_current_cgroup_id(). We are,
however, still in the parent so the get_current_cgroup_id() helper will
return the cgroup of the parent, not of the child (which is what we
want).

To simplify the code complexity (both from the perspective of
understandabiliuty but also verification complexity), we instead remove
the tracker id and retrieve the current cgroup id at the time of the
check. This is much simpler, and not less efficient for the common case
(cgroupv2).

Following the same steps with a version build with the current patch,
produces an event as expected.

Signed-off-by: Kornilios Kourtis <[email protected]>
  • Loading branch information
kkourt authored and jrfastab committed Oct 13, 2023
1 parent 5d1df97 commit 2ebcd9a
Show file tree
Hide file tree
Showing 10 changed files with 76 additions and 165 deletions.
112 changes: 46 additions & 66 deletions bpf/lib/bpf_cgroup.h
Original file line number Diff line number Diff line change
Expand Up @@ -311,7 +311,7 @@ get_task_cgroup(struct task_struct *task, __u32 subsys_idx, __u32 *error_flags)
}

/**
* tg_get_current_cgroup_id() Returns the accurate cgroup id of current task.
* __tg_get_current_cgroup_id() Returns the accurate cgroup id of current task.
* @cgrp: cgroup target of current task.
* @cgrpfs_ver: Cgroupfs Magic number either Cgroupv1 or Cgroupv2
*
Expand All @@ -323,23 +323,62 @@ get_task_cgroup(struct task_struct *task, __u32 subsys_idx, __u32 *error_flags)
* Returns the cgroup id of current task on success, zero on failures.
*/
static inline __attribute__((always_inline)) __u64
tg_get_current_cgroup_id(struct cgroup *cgrp, __u64 cgrpfs_ver)
__tg_get_current_cgroup_id(struct cgroup *cgrp, __u64 cgrpfs_ver)
{
__u64 id = 0;

/*
* Try the bpf helper on the default hierarchy if available
* and if we are running in unified cgroupv2
*/
if (bpf_core_enum_value_exists(enum bpf_func_id,
BPF_FUNC_get_current_cgroup_id) &&
cgrpfs_ver == CGROUP2_SUPER_MAGIC) {
id = get_current_cgroup_id();
return get_current_cgroup_id();
} else {
id = get_cgroup_id(cgrp);
return get_cgroup_id(cgrp);
}
}

return id;
/**
* tg_get_current_cgroup_id() Returns the accurate cgroup id of current task.
*
* It works similar to __tg_get_current_cgroup_id, but computes the cgrp if it is needed.
* Returns the cgroup id of current task on success, zero on failures.
*/
static inline __attribute__((always_inline)) __u64
tg_get_current_cgroup_id()
{
__u32 error_flags;
struct cgroup *cgrp;
__u64 cgrpfs_magic = 0;
struct task_struct *task;
struct tetragon_conf *conf;
int zero = 0, subsys_idx = 0;

conf = map_lookup_elem(&tg_conf_map, &zero);
if (conf) {
/* Select which cgroup version */
cgrpfs_magic = conf->cgrp_fs_magic;
subsys_idx = conf->tg_cgrp_subsys_idx;
}

/*
* Try the bpf helper on the default hierarchy if available
* and if we are running in unified cgroupv2
*/
if (bpf_core_enum_value_exists(enum bpf_func_id,
BPF_FUNC_get_current_cgroup_id) &&
cgrpfs_magic == CGROUP2_SUPER_MAGIC) {
return get_current_cgroup_id();
}

task = (struct task_struct *)get_current_task();

// NB: error_flags are ignored for now
cgrp = get_task_cgroup(task, subsys_idx, &error_flags);
if (!cgrp)
return 0;

return get_cgroup_id(cgrp);
}

/**
Expand Down Expand Up @@ -392,63 +431,4 @@ __init_cgrp_tracking_val_heap(struct cgroup *cgrp, cgroup_state state)
return heap;
}

/**
* __set_task_cgrpid_tracker() Sets the cgrpid_tracker of a task.
* It checks tetragon_conf if not available then exit.
* If tetragon_conf is available then checks if the task
* execve_map_value->cgrpid_tracker is set, if so do nothing.
* If not, then set the task execve_map_value->cgrpid_tracker
* to the tracking cgroup ID.
*/
static inline __attribute__((always_inline)) int
__set_task_cgrpid_tracker(struct tetragon_conf *conf, struct task_struct *task,
struct execve_map_value *execve_val, __u32 *error_flags)
{
u32 flags = 0;
struct cgroup *cgrp;
__u64 cgrpid_tracker = 0;

/* Ensure that execve_val is not null */
if (unlikely(!conf) || unlikely(!execve_val))
return 0;

probe_read(&flags, sizeof(flags), _(&task->flags));
/* Skip kernel threads */
if (flags & PF_KTHREAD)
return 0;

/* Set the tracking cgroup id only if it was not set,
* this avoids cgroup thread granularity mess!
*/
if (execve_val->cgrpid_tracker != 0)
return 0;

cgrp = get_task_cgroup(task, conf->tg_cgrp_subsys_idx, error_flags);
if (!cgrp)
return 0;

/* TODO: get current cgroup ancestors and their levels,
* so we can use them as a proper reference for setting,
* cgroup tracking IDs.
*
* For now we just use the current cgroup ID as tracker.
*/
cgrpid_tracker = tg_get_current_cgroup_id(cgrp, conf->cgrp_fs_magic);
if (!cgrpid_tracker) {
/* Failed to get cgrpid_tracker do nothing. This should never happen */
*error_flags |= EVENT_ERROR_CGROUP_ID;
return 0;
}

/* Update the execve_val with the tracking cgroup ID */
execve_val->cgrpid_tracker = cgrpid_tracker;

/* TODO:
* Lookup cgroup data from the cgroup bpf map, create an entry
* if not found, otherwise update previous with the corresponding
* new cgroup state.
*/
return 0;
}

#endif // __BPF_CGROUP_
1 change: 0 additions & 1 deletion bpf/lib/process.h
Original file line number Diff line number Diff line change
Expand Up @@ -302,7 +302,6 @@ struct execve_map_value {
__u32 nspid;
__u32 binary;
__u32 pad;
__u64 cgrpid_tracker; /* Pinned Cgroup ID Tracker */
struct msg_ns ns;
struct msg_capabilities caps;
} __attribute__((packed)) __attribute__((aligned(8)));
Expand Down
23 changes: 1 addition & 22 deletions bpf/process/bpf_fork.c
Original file line number Diff line number Diff line change
Expand Up @@ -21,10 +21,7 @@ __attribute__((section("kprobe/wake_up_new_task"), used)) int
BPF_KPROBE(event_wake_up_new_task, struct task_struct *task)
{
struct execve_map_value *curr, *parent;
struct task_struct *current_task;
struct tetragon_conf *config;
u32 tgid = 0, ppid = 0, error_flags = 0;
int zero = 0;
u32 tgid = 0, error_flags = 0;

if (!task)
return 0;
Expand All @@ -35,24 +32,6 @@ BPF_KPROBE(event_wake_up_new_task, struct task_struct *task)
if (!curr)
return 0;

/* Cgroup environment */
config = map_lookup_elem(&tg_conf_map, &zero);
if (config) {
/* Set the tracking cgroup ID for the new task if not already set */
__set_task_cgrpid_tracker(config, task, curr, &error_flags);

/* Let's try to catch current or "parent" if it was not tracked */
current_task = (struct task_struct *)get_current_task();
probe_read(&ppid, sizeof(ppid), _(&current_task->tgid));
/* If they share same thread group nothing todo... */
if (tgid != ppid) {
parent = execve_map_get_noinit(ppid);
if (parent)
__set_task_cgrpid_tracker(config, current_task,
parent, &error_flags);
}
}

/* Generate an EVENT_COMMON_FLAG_CLONE event once per process,
* that is, thread group.
*/
Expand Down
65 changes: 8 additions & 57 deletions bpf/process/bpf_process_event.h
Original file line number Diff line number Diff line change
Expand Up @@ -531,45 +531,6 @@ get_namespaces(struct msg_ns *msg, struct task_struct *task)
}
}

/* Gather current task cgroup id */
static inline __attribute__((always_inline)) void
__event_get_current_cgroup_id(struct tetragon_conf *conf, struct cgroup *cgrp,
struct execve_map_value *execve_val,
struct msg_execve_event *msg)
{
__u64 cgrpfs_magic = 0;
struct msg_process *process;

process = &msg->process;
msg->kube.cgrpid = 0;

/* If tracking Cgroup ID is set use it, otherwise use current
* context one. This way we guarantee to always have a cgrpid
* and return cgroup information even if we miss the tracking
* Cgroup ID.
*
* Ensure that execve_val is not null.
*/
if (execve_val && execve_val->cgrpid_tracker) {
/* Set the k8s cgrpid with the tracking ID. */
msg->kube.cgrpid = execve_val->cgrpid_tracker;
return;
}

/* Gather Cgroup information using current context */
if (conf) {
/* Select which cgroup version */
cgrpfs_magic = conf->cgrp_fs_magic;
}

/* Collect event cgroup ID */
msg->kube.cgrpid = tg_get_current_cgroup_id(cgrp, cgrpfs_magic);
if (execve_val)
execve_val->cgrpid_tracker = msg->kube.cgrpid;
if (!msg->kube.cgrpid)
process->flags |= EVENT_ERROR_CGROUP_ID;
}

/* Gather current task cgroup name */
static inline __attribute__((always_inline)) void
__event_get_current_cgroup_name(struct cgroup *cgrp,
Expand Down Expand Up @@ -610,44 +571,34 @@ static inline __attribute__((always_inline)) void
__event_get_cgroup_info(struct task_struct *task,
struct msg_execve_event *msg)
{
__u32 pid;
__u64 cgrpfs_magic = 0;
int zero = 0, subsys_idx = 0;
struct cgroup *cgrp;
struct msg_process *process;
struct tetragon_conf *conf;
struct execve_map_value *execve_val;

process = &msg->process;

/* Clear cgroup info at the beginning, so if we return early we do not pass previous data */
memset(&msg->kube, 0, sizeof(struct msg_k8s));

pid = (get_current_pid_tgid() >> 32);
execve_val = execve_map_get_noinit(pid);
/* Even if execve_val is null we must continue collect information */

/* Check if cgroup configuration is set */
conf = map_lookup_elem(&tg_conf_map, &zero);
if (conf) {
/* Select the right css to use */
if (conf->tg_cgrp_subsys_idx != 0)
subsys_idx = conf->tg_cgrp_subsys_idx;

/* Set the tracking cgroup ID of the task if it was not
* already set, this could be the case due to race conditions.
* Do not remove this as following cgroup logic depend on it.
*/
__set_task_cgrpid_tracker(conf, task, execve_val, &process->flags);
/* Select which cgroup version */
cgrpfs_magic = conf->cgrp_fs_magic;
subsys_idx = conf->tg_cgrp_subsys_idx;
}

cgrp = get_task_cgroup(task, subsys_idx, &process->flags);
if (!cgrp)
return;

/* Collect event cgroup ID */
__event_get_current_cgroup_id(conf, cgrp, execve_val, msg);
msg->kube.cgrpid = __tg_get_current_cgroup_id(cgrp, cgrpfs_magic);
if (!msg->kube.cgrpid)
process->flags |= EVENT_ERROR_CGROUP_ID;

/* Get the cgroup name of this event. TODO: pass the tracking cgroup ID. */
/* Get the cgroup name of this event. */
__event_get_current_cgroup_name(cgrp, msg);
}
#endif
14 changes: 6 additions & 8 deletions bpf/process/policy_filter.h
Original file line number Diff line number Diff line change
Expand Up @@ -26,23 +26,21 @@ struct {
static inline __attribute__((always_inline)) bool
policy_filter_check(u32 policy_id)
{
__u32 pid;
void *policy_map;
struct execve_map_value *curr;
__u64 cgroupid;

if (!policy_id)
return true;

pid = (get_current_pid_tgid() >> 32);
curr = execve_map_get_noinit(pid);
if (!curr || curr->cgrpid_tracker == 0)
return false;

policy_map = map_lookup_elem(&policy_filter_maps, &policy_id);
if (!policy_map)
return false;

return map_lookup_elem(policy_map, &curr->cgrpid_tracker);
cgroupid = tg_get_current_cgroup_id();
if (!cgroupid)
return false;

return map_lookup_elem(policy_map, &cgroupid);
}

#endif /* POLICY_FILTER_MAPS_H__ */
1 change: 0 additions & 1 deletion pkg/grpc/exec/exec.go
Original file line number Diff line number Diff line change
Expand Up @@ -129,7 +129,6 @@ func (msg *MsgCgroupEventUnix) HandleMessage() *tetragon.GetEventsResponse {
"cgroup.event": op.String(),
"PID": msg.PID,
"NSPID": msg.NSPID,
"cgroup.IDTracker": msg.CgrpidTracker,
"cgroup.ID": msg.Cgrpid,
"cgroup.state": st,
"cgroup.hierarchyID": msg.CgrpData.HierarchyId,
Expand Down
17 changes: 8 additions & 9 deletions pkg/sensors/exec/execvemap/execve.go
Original file line number Diff line number Diff line change
Expand Up @@ -12,13 +12,12 @@ type ExecveKey struct {
}

type ExecveValue struct {
Process processapi.MsgExecveKey `align:"key"`
Parent processapi.MsgExecveKey `align:"pkey"`
Flags uint32 `align:"flags"`
Nspid uint32 `align:"nspid"`
Binary uint32 `align:"binary"`
Pad uint32 `align:"pad"`
CgrpIdTracker uint64 `align:"cgrpid_tracker"`
Namespaces processapi.MsgNamespaces `align:"ns"`
Capabilities processapi.MsgCapabilities `align:"caps"`
Process processapi.MsgExecveKey `align:"key"`
Parent processapi.MsgExecveKey `align:"pkey"`
Flags uint32 `align:"flags"`
Nspid uint32 `align:"nspid"`
Binary uint32 `align:"binary"`
Pad uint32 `align:"pad"`
Namespaces processapi.MsgNamespaces `align:"ns"`
Capabilities processapi.MsgCapabilities `align:"caps"`
}
3 changes: 3 additions & 0 deletions pkg/sensors/tracing/kprobe_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -3731,6 +3731,9 @@ func TestLoadKprobeSensor(t *testing.T) {

// generic_kprobe_process_event*,generic_kprobe_actions,retkprobe
tus.SensorMap{Name: "fdinstall_map", Progs: []uint{1, 2, 3, 4, 5, 12, 14}},

// generic_kprobe_event
tus.SensorMap{Name: "tg_conf_map", Progs: []uint{0}},
}

if kernels.EnableLargeProgs() {
Expand Down
3 changes: 3 additions & 0 deletions pkg/sensors/tracing/tracepoint_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -452,6 +452,9 @@ func TestLoadTracepointSensor(t *testing.T) {

// all kprobe but generic_tracepoint_filter
tus.SensorMap{Name: "config_map", Progs: []uint{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10}},

// generic_tracepoint_event
tus.SensorMap{Name: "tg_conf_map", Progs: []uint{0}},
}

if kernels.EnableLargeProgs() {
Expand Down
2 changes: 1 addition & 1 deletion pkg/testutils/sensors/load.go
Original file line number Diff line number Diff line change
Expand Up @@ -131,7 +131,7 @@ func mergeInBaseSensorMaps(t *testing.T, sensorMaps []SensorMap, sensorProgs []S

// event_execve
SensorMap{Name: "names_map", Progs: []uint{0}},
SensorMap{Name: "tg_conf_map", Progs: []uint{0, 2}},
SensorMap{Name: "tg_conf_map", Progs: []uint{0}},

// event_wake_up_new_task
SensorMap{Name: "execve_val", Progs: []uint{2}},
Expand Down

0 comments on commit 2ebcd9a

Please sign in to comment.