forked from docker-archive/runc
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[backport] CVE-2024-21626 to 1.0.0-rc95 #104
Open
kolyshkin
wants to merge
14
commits into
release-1.0.0-rc95
Choose a base branch
from
release-rc95-cve
base: release-1.0.0-rc95
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
[@kolyshkin: backport to rc95] Signed-off-by: hang.jiang <[email protected]> Fixes: GHSA-xr7r-f8xq-vfvv CVE-2024-21626 Signed-off-by: Aleksa Sarai <[email protected]> (cherry picked from commit 7362cd5afe9d40131fb62cb075092025c7c71064) Signed-off-by: Kir Kolyshkin <[email protected]>
If a file descriptor of a directory in the host's mount namespace is leaked to runc init, a malicious config.json could use /proc/self/fd/... as a working directory to allow for host filesystem access after the container runs. This can also be exploited by a container process if it knows that an administrator will use "runc exec --cwd" and the target --cwd (the attacker can change that cwd to be a symlink pointing to /proc/self/fd/... and wait for the process to exec and then snoop on /proc/$pid/cwd to get access to the host). The former issue can lead to a critical vulnerability in Docker and Kubernetes, while the latter is a container breakout. We can (ab)use the fact that getcwd(2) on Linux detects this exact case, and getcwd(3) and Go's Getwd() return an error as a result. Thus, if we just do os.Getwd() after chdir we can easily detect this case and error out. In runc 1.1, a /sys/fs/cgroup handle happens to be leaked to "runc init", making this exploitable. On runc main it just so happens that the leaked /sys/fs/cgroup gets clobbered and thus this is only consistently exploitable for runc 1.1. Fixes: GHSA-xr7r-f8xq-vfvv CVE-2024-21626 Co-developed-by: lifubang <[email protected]> Signed-off-by: lifubang <[email protected]> [refactored the implementation and added more comments] Signed-off-by: Aleksa Sarai <[email protected]> (cherry picked from commit 6b6b6e9a656b407de5d372d57ee3f1709b18a2af) Signed-off-by: Kir Kolyshkin <[email protected]>
(This is a partial backport of a minor change included in commit dac4171.) This mirrors the logic in standard_init_linux.go, and also ensures that we do not call exec.LookPath in the final execve step. While this is okay for regular binaries, it seems exec.LookPath calls os.Getenv which tries to emit a log entry to the test harness when running in "go test" mode. In a future patch (in order to fix CVE-2024-21626), we will close all of the file descriptors immediately before execve, which would mean the file descriptor for test harness logging would be closed at execve time. So, moving exec.LookPath earlier is necessary. [kolyshkin: removed eaccess part, fix imports conflict] Ref: dac4171 ("runc-dmz: reduce memfd binary cloning cost with small C binary") Signed-off-by: Aleksa Sarai <[email protected]> (cherry picked from commit 9bbeb73b7371262792a811f5a8e59d393fb4d1dd) Signed-off-by: Kir Kolyshkin <[email protected]>
When running a script from an azure file share interrupted syscall occurs quite frequently, to remedy this add retries around execve syscall, when EINTR is returned. Signed-off-by: Maksim An <[email protected]> (cherry picked from commit e39ad65) Signed-off-by: Kir Kolyshkin <[email protected]>
If we leak a file descriptor referencing the host filesystem, an attacker could use a /proc/self/fd magic-link as the source for execve to execute a host binary in the container. This would allow the binary itself (or a process inside the container in the 'runc exec' case) to write to a host binary, leading to a container escape. The simple solution is to make sure we close all file descriptors immediately before the execve(2) step. Doing this earlier can lead to very serious issues in Go (as file descriptors can be reused, any (*os.File) reference could start silently operating on a different file) so we have to do it as late as possible. Unfortunately, there are some Go runtime file descriptors that we must not close (otherwise the Go scheduler panics randomly). The only way of being sure which file descriptors cannot be closed is to sneakily go:linkname the runtime internal "internal/poll.IsPollDescriptor" function. This is almost certainly not recommended but there isn't any other way to be absolutely sure, while also closing any other possible files. In addition, we can keep the logrus forwarding logfd open because you cannot execve a pipe and the contents of the pipe are so restricted (JSON-encoded in a format we pick) that it seems unlikely you could even construct shellcode. Closing the logfd causes issues if there is an error returned from execve. In mainline runc, runc-dmz protects us against this attack because the intermediate execve(2) closes all of the O_CLOEXEC internal runc file descriptors and thus runc-dmz cannot access them to attack the host. [kolyshkin: backport, merge in commit 25862d58dabbad8f928d5137] Fixes: GHSA-xr7r-f8xq-vfvv CVE-2024-21626 Signed-off-by: Aleksa Sarai <[email protected]> (cherry picked from commit 3b3e65306167133c6d97b9801d05b8fea50f30f7) Signed-off-by: Kir Kolyshkin <[email protected]>
We auto-close this file descriptor in the final exec step, but it's probably a good idea to not possibly leak the file descriptor to "runc init" (we've had issues like this in the past) especially since it is a directory handle from the host mount namespace. In practice, on runc 1.1 this does leak to "runc init" but on main the handle has a low enough file descriptor that it gets clobbered by the ForkExec of "runc init". OPEN_TREE_CLONE would let us protect this handle even further, but the performance impact of creating an anonymous mount namespace is probably not worth it. Also, switch to using an *os.File for the handle so if it goes out of scope during setup (i.e. an error occurs during setup) it will get cleaned up by the GC. Fixes: GHSA-xr7r-f8xq-vfvv CVE-2024-21626 Signed-off-by: Aleksa Sarai <[email protected]> (cherry picked from commit 362aebef2bbd48099958c29337900b88db151abf) Signed-off-by: Kir Kolyshkin <[email protected]>
Given the core issue in GHSA-xr7r-f8xq-vfvv was that we were unknowingly leaking file descriptors to "runc init", it seems prudent to make sure we proactively prevent this in the future. The solution is to simply mark all non-stdio file descriptors as O_CLOEXEC before we spawn "runc init". For libcontainer library users, this could result in unrelated files being marked as O_CLOEXEC -- however (for the same reason we are doing this for runc), for security reasons those files should've been marked as O_CLOEXEC anyway. Fixes: GHSA-xr7r-f8xq-vfvv CVE-2024-21626 Signed-off-by: Aleksa Sarai <[email protected]> (cherry picked from commit 9a72fd4be1fdc22148642ee8cd2adfee0bf575e5) Signed-off-by: Kir Kolyshkin <[email protected]>
kolyshkin
force-pushed
the
release-rc95-cve
branch
2 times, most recently
from
February 6, 2024 23:58
b7b51e3
to
9a7d83b
Compare
1. Add .cirrus.yml to run tests on CentOS 7. 2. Drop fedora/centos7 from .github config (not interested in Fedora, and we already test CentOS 7 on Cirrus CI). Signed-off-by: Kir Kolyshkin <[email protected]>
…bundle Currently only amd64 and arm64v8 tarball have been checked in testdata, while busybox bundle is downloaded on fly, and supports multiple architectures. To enable integration tests for more architectures, the hello world bundle is replaced by busybox one. Signed-off-by: Shengjing Zhu <[email protected]> (cherry picked from commit 66bf371) (cherry picked from commit b8ebeec) Signed-off-by: Kir Kolyshkin <[email protected]> (cherry picked from commit 790886cd4b1b64a21404731111d3999f3fd2390b) Signed-off-by: Kir Kolyshkin <[email protected]>
Fix issue 3699 Signed-off-by: Akihiro Suda <[email protected]> (cherry picked from commit 3b6625c) Signed-off-by: Kir Kolyshkin <[email protected]> (cherry picked from commit 60440c3e7beee6dbd3252ac8aa8458124270bd04) Signed-off-by: Kir Kolyshkin <[email protected]>
Signed-off-by: Tianon Gravi <[email protected]> (cherry picked from commit 6d28928) (cherry picked from commit 53ceeea) Signed-off-by: Kir Kolyshkin <[email protected]> (cherry picked from commit f1e461e03cd65659395e1ba4a34e53985bd25a9e) Signed-off-by: Kir Kolyshkin <[email protected]>
Most probably this old version of runc no longer works correctly with the latest criu. Not installing criu means criu tests will be skipped. Signed-off-by: Kir Kolyshkin <[email protected]>
Add CAP_SYSLOG to ensure that /dev/kmsg can be accesses on systems where the sysctl kernel.dmesg_restrict = 1. Signed-off-by: Odin Ugedal <[email protected]> (cherry picked from commit 6be088d) Signed-off-by: Kir Kolyshkin <[email protected]>
The test is failing like this: not ok 70 runc run --no-pivot must not expose bare /proc # (in test file tests/integration/no_pivot.bats, line 20) # `[[ "$output" == *"mount: permission denied"* ]]' failed # runc spec (status=0): # # runc run --no-pivot test_no_pivot (status=1): # unshare: write error: Operation not permitted Apparently, a recent kernel commit db2e718a47984b9d prevents root from doing unshare -r unless it has CAP_SETFPCAP. Add the capability for this specific test. Signed-off-by: Kir Kolyshkin <[email protected]> (cherry picked from commit 1bbeada) Signed-off-by: Kir Kolyshkin <[email protected]>
kolyshkin
force-pushed
the
release-rc95-cve
branch
from
February 7, 2024 01:22
0b243f8
to
29a6896
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Note: this PR is created for the sake of CI.
This is a backport of CVE-2024-21626 fixes to runc-1.0.0-rc95.
Update: It works! Never mind
validate / *
failures, those are non-essential. I was mostly interested inci / test
andci / centos7
jobs which are consistently passing.