Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

expose /sys/fs/cgroup as writable (for memory and other pressure measures from kernel) #23690

Open
grooverdan opened this issue Aug 21, 2024 · 3 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature. stale-issue

Comments

@grooverdan
Copy link
Contributor

Feature request description

PSI, pressure stall information, of the kernel should be available to user containers by default.

Suggest potential solution

cgroupv2 as rw by default.

diff --git a/pkg/specgen/generate/oci_linux.go b/pkg/specgen/generate/oci_linux.go
index d6247bbf6..48cc60275 100644
--- a/pkg/specgen/generate/oci_linux.go
+++ b/pkg/specgen/generate/oci_linux.go
@@ -70,16 +70,7 @@ func getCgroupPermissions(unmask []string) string {
                return ro
        }
 
-       if len(unmask) != 0 && unmask[0] == "ALL" {
-               return rw
-       }
-
-       for _, p := range unmask {
-               if path.Clean(p) == cgroup {
-                       return rw
-               }
-       }
-       return ro
+       return rw
 }

Have you considered any alternatives?

Running --security-opt unmask=/sys/fs/cgroup exposes it rw is an obvious workaround, but if it can't work around pre-runtime setup of the container, is it acceptable to not make it RO?

If the cpu/memory/device command line options impose limits maybe the mounted cgroup needs to be a submount from there to enforce those limits?

Additional context

/sys/fs/cgroup was originally added as a ro bind mount - cf36470. Because rbind exposed host info this became a fresh mount (8d3010d). Because this is a fresh mount of a cgroup, the information pertains to the current container and should be writeable (as the permissions on the /sys/fs/cgroup/ files indicate)?

Negative case:

There are a number of cpu/memory/device cgroupv2 limits exposed on the command line and a rw mount might make these able to be exceeded by a container.

@grooverdan grooverdan added the kind/feature Categorizes issue or PR as related to a new feature. label Aug 21, 2024
Copy link

A friendly reminder that this issue had no activity for 30 days.

@Luap99
Copy link
Member

Luap99 commented Oct 30, 2024

@giuseppe PTAL, I Assume the cgroup should be read only for security so this is not something we can do by default?

@giuseppe
Copy link
Member

A container could create a lot of sub-cgroups, potentially slowing down the kernel. So for the default case, I think it is still safer to mount it read-only and tweak it only when necessary

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. stale-issue
Projects
None yet
Development

No branches or pull requests

3 participants