Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[backport] CVE-2024-21626 to 1.0.0-rc95 #104

Open
wants to merge 14 commits into
base: release-1.0.0-rc95
Choose a base branch
from

Conversation

kolyshkin
Copy link
Owner

@kolyshkin kolyshkin commented Feb 6, 2024

Note: this PR is created for the sake of CI.

This is a backport of CVE-2024-21626 fixes to runc-1.0.0-rc95.

Update: It works! Never mind validate / * failures, those are non-essential. I was mostly interested in ci / test and ci / centos7 jobs which are consistently passing.

hang.jiang and others added 7 commits February 6, 2024 12:18
[@kolyshkin: backport to rc95]

Signed-off-by: hang.jiang <[email protected]>
Fixes: GHSA-xr7r-f8xq-vfvv CVE-2024-21626
Signed-off-by: Aleksa Sarai <[email protected]>
(cherry picked from commit 7362cd5afe9d40131fb62cb075092025c7c71064)
Signed-off-by: Kir Kolyshkin <[email protected]>
If a file descriptor of a directory in the host's mount namespace is
leaked to runc init, a malicious config.json could use /proc/self/fd/...
as a working directory to allow for host filesystem access after the
container runs. This can also be exploited by a container process if it
knows that an administrator will use "runc exec --cwd" and the target
--cwd (the attacker can change that cwd to be a symlink pointing to
/proc/self/fd/... and wait for the process to exec and then snoop on
/proc/$pid/cwd to get access to the host). The former issue can lead to
a critical vulnerability in Docker and Kubernetes, while the latter is a
container breakout.

We can (ab)use the fact that getcwd(2) on Linux detects this exact case,
and getcwd(3) and Go's Getwd() return an error as a result. Thus, if we
just do os.Getwd() after chdir we can easily detect this case and error
out.

In runc 1.1, a /sys/fs/cgroup handle happens to be leaked to "runc
init", making this exploitable. On runc main it just so happens that the
leaked /sys/fs/cgroup gets clobbered and thus this is only consistently
exploitable for runc 1.1.

Fixes: GHSA-xr7r-f8xq-vfvv CVE-2024-21626
Co-developed-by: lifubang <[email protected]>
Signed-off-by: lifubang <[email protected]>
[refactored the implementation and added more comments]
Signed-off-by: Aleksa Sarai <[email protected]>
(cherry picked from commit 6b6b6e9a656b407de5d372d57ee3f1709b18a2af)
Signed-off-by: Kir Kolyshkin <[email protected]>
(This is a partial backport of a minor change included in commit
dac4171.)

This mirrors the logic in standard_init_linux.go, and also ensures that
we do not call exec.LookPath in the final execve step.

While this is okay for regular binaries, it seems exec.LookPath calls
os.Getenv which tries to emit a log entry to the test harness when
running in "go test" mode. In a future patch (in order to fix
CVE-2024-21626), we will close all of the file descriptors immediately
before execve, which would mean the file descriptor for test harness
logging would be closed at execve time. So, moving exec.LookPath earlier
is necessary.

[kolyshkin: removed eaccess part, fix imports conflict]

Ref: dac4171 ("runc-dmz: reduce memfd binary cloning cost with small C binary")
Signed-off-by: Aleksa Sarai <[email protected]>
(cherry picked from commit 9bbeb73b7371262792a811f5a8e59d393fb4d1dd)
Signed-off-by: Kir Kolyshkin <[email protected]>
When running a script from an azure file share interrupted syscall
occurs quite frequently, to remedy this add retries around execve
syscall, when EINTR is returned.

Signed-off-by: Maksim An <[email protected]>
(cherry picked from commit e39ad65)
Signed-off-by: Kir Kolyshkin <[email protected]>
If we leak a file descriptor referencing the host filesystem, an
attacker could use a /proc/self/fd magic-link as the source for execve
to execute a host binary in the container. This would allow the binary
itself (or a process inside the container in the 'runc exec' case) to
write to a host binary, leading to a container escape.

The simple solution is to make sure we close all file descriptors
immediately before the execve(2) step. Doing this earlier can lead to very
serious issues in Go (as file descriptors can be reused, any (*os.File)
reference could start silently operating on a different file) so we have
to do it as late as possible.

Unfortunately, there are some Go runtime file descriptors that we must
not close (otherwise the Go scheduler panics randomly). The only way of
being sure which file descriptors cannot be closed is to sneakily
go:linkname the runtime internal "internal/poll.IsPollDescriptor"
function. This is almost certainly not recommended but there isn't any
other way to be absolutely sure, while also closing any other possible
files.

In addition, we can keep the logrus forwarding logfd open because you
cannot execve a pipe and the contents of the pipe are so restricted
(JSON-encoded in a format we pick) that it seems unlikely you could even
construct shellcode. Closing the logfd causes issues if there is an
error returned from execve.

In mainline runc, runc-dmz protects us against this attack because the
intermediate execve(2) closes all of the O_CLOEXEC internal runc file
descriptors and thus runc-dmz cannot access them to attack the host.

[kolyshkin: backport, merge in commit 25862d58dabbad8f928d5137]

Fixes: GHSA-xr7r-f8xq-vfvv CVE-2024-21626
Signed-off-by: Aleksa Sarai <[email protected]>
(cherry picked from commit 3b3e65306167133c6d97b9801d05b8fea50f30f7)
Signed-off-by: Kir Kolyshkin <[email protected]>
We auto-close this file descriptor in the final exec step, but it's
probably a good idea to not possibly leak the file descriptor to "runc
init" (we've had issues like this in the past) especially since it is a
directory handle from the host mount namespace.

In practice, on runc 1.1 this does leak to "runc init" but on main the
handle has a low enough file descriptor that it gets clobbered by the
ForkExec of "runc init".

OPEN_TREE_CLONE would let us protect this handle even further, but the
performance impact of creating an anonymous mount namespace is probably
not worth it.

Also, switch to using an *os.File for the handle so if it goes out of
scope during setup (i.e. an error occurs during setup) it will get
cleaned up by the GC.

Fixes: GHSA-xr7r-f8xq-vfvv CVE-2024-21626
Signed-off-by: Aleksa Sarai <[email protected]>
(cherry picked from commit 362aebef2bbd48099958c29337900b88db151abf)
Signed-off-by: Kir Kolyshkin <[email protected]>
Given the core issue in GHSA-xr7r-f8xq-vfvv was that we were unknowingly
leaking file descriptors to "runc init", it seems prudent to make sure
we proactively prevent this in the future. The solution is to simply
mark all non-stdio file descriptors as O_CLOEXEC before we spawn "runc
init".

For libcontainer library users, this could result in unrelated files
being marked as O_CLOEXEC -- however (for the same reason we are doing
this for runc), for security reasons those files should've been marked
as O_CLOEXEC anyway.

Fixes: GHSA-xr7r-f8xq-vfvv CVE-2024-21626
Signed-off-by: Aleksa Sarai <[email protected]>
(cherry picked from commit 9a72fd4be1fdc22148642ee8cd2adfee0bf575e5)
Signed-off-by: Kir Kolyshkin <[email protected]>
@kolyshkin kolyshkin force-pushed the release-rc95-cve branch 2 times, most recently from b7b51e3 to 9a7d83b Compare February 6, 2024 23:58
kolyshkin and others added 7 commits February 6, 2024 17:17
1. Add .cirrus.yml to run tests on CentOS 7.

2. Drop fedora/centos7 from .github config (not interested in Fedora,
   and we already test CentOS 7 on Cirrus CI).

Signed-off-by: Kir Kolyshkin <[email protected]>
…bundle

Currently only amd64 and arm64v8 tarball have been checked in testdata,
while busybox bundle is downloaded on fly, and supports multiple architectures.

To enable integration tests for more architectures, the hello world
bundle is replaced by busybox one.

Signed-off-by: Shengjing Zhu <[email protected]>
(cherry picked from commit 66bf371)
(cherry picked from commit b8ebeec)
Signed-off-by: Kir Kolyshkin <[email protected]>
(cherry picked from commit 790886cd4b1b64a21404731111d3999f3fd2390b)
Signed-off-by: Kir Kolyshkin <[email protected]>
Fix issue 3699

Signed-off-by: Akihiro Suda <[email protected]>
(cherry picked from commit 3b6625c)
Signed-off-by: Kir Kolyshkin <[email protected]>
(cherry picked from commit 60440c3e7beee6dbd3252ac8aa8458124270bd04)
Signed-off-by: Kir Kolyshkin <[email protected]>
Signed-off-by: Tianon Gravi <[email protected]>
(cherry picked from commit 6d28928)
(cherry picked from commit 53ceeea)
Signed-off-by: Kir Kolyshkin <[email protected]>
(cherry picked from commit f1e461e03cd65659395e1ba4a34e53985bd25a9e)
Signed-off-by: Kir Kolyshkin <[email protected]>
Most probably this old version of runc no longer works correctly with
the latest criu. Not installing criu means criu tests will be skipped.

Signed-off-by: Kir Kolyshkin <[email protected]>
Add CAP_SYSLOG to ensure that /dev/kmsg can be accesses on systems where
the sysctl kernel.dmesg_restrict = 1.

Signed-off-by: Odin Ugedal <[email protected]>
(cherry picked from commit 6be088d)
Signed-off-by: Kir Kolyshkin <[email protected]>
The test is failing like this:

	not ok 70 runc run --no-pivot must not expose bare /proc
	# (in test file tests/integration/no_pivot.bats, line 20)
	#   `[[ "$output" == *"mount: permission denied"* ]]' failed
	# runc spec (status=0):
	#
	# runc run --no-pivot test_no_pivot (status=1):
	# unshare: write error: Operation not permitted

Apparently, a recent kernel commit db2e718a47984b9d prevents
root from doing unshare -r unless it has CAP_SETFPCAP.

Add the capability for this specific test.

Signed-off-by: Kir Kolyshkin <[email protected]>
(cherry picked from commit 1bbeada)
Signed-off-by: Kir Kolyshkin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants