Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SELinux in enforcing mode breaks nested gVisor container #880

Closed
apyrgio opened this issue Jul 26, 2024 · 4 comments · Fixed by #881
Closed

SELinux in enforcing mode breaks nested gVisor container #880

apyrgio opened this issue Jul 26, 2024 · 4 comments · Fixed by #881
Labels
Milestone

Comments

@apyrgio
Copy link
Contributor

apyrgio commented Jul 26, 2024

Problem

While debugging a Dangerzone 0.7.0 bug report on a Fedora 40 system (#871), we found out that SELinux in enforcing mode breaks the inner gVisor container that runs within the outer Podman container.

Original error

$ dangerzone-cli ~/Downloads/sample.pdf
[...]
[DEBUG] Marking doc qyU8c4 as 'converting'
[INFO] > /usr/bin/podman run --log-driver none --security-opt no-new-privileges --cap-drop all --cap-add SYS_CHROOT --network=none -u dangerzone --rm -i --name dangerzone-doc-to-pixels-qyU8c4 --userns nomap dangerzone.rocks/dangerzone /usr/bin/python3 -m dangerzone.conversion.doc_to_pixels
[ERROR] [doc qyU8c4] 0% Unspecified error
[DEBUG] Marking doc qyU8c4 as 'failed'

Output from Podman container with gVisor debug logging enabled

$ /usr/bin/podman run -e RUNSC_DEBUG=1 --log-driver none --security-opt no-new-privileges --cap-drop all --cap-add SYS_CHROOT --network=none -u dangerzone --rm -i --name dangerzone-doc-to-pixels-qyU8c4 --userns nomap dangerzone.rocks/dangerzone /usr/bin/python3 -m dangerzone.conversion.doc_to_pixels

8< ... snip ... >8

I0725 14:19:04.811776      20 main.go:197] **************** gVisor ****************
I0725 14:19:04.812445      20 boot.go:258] Setting product_name: "Standard PC (Q35 + ICH9, 2009)"
W0725 14:19:04.813081      20 specutils.go:129] noNewPrivileges ignored. PR_SET_NO_NEW_PRIVS is assumed to always be set.
I0725 14:19:04.813105      20 chroot.go:91] Setting up sandbox chroot in "/tmp"
W0725 14:19:04.813312      20 chroot.go:108] Failed to copy /etc/localtime: open /etc/localtime: no such file or directory. UTC timezone will be used.
I0725 14:19:04.813335      20 chroot.go:36] Mounting "/proc" at "/tmp/proc"
W0725 14:19:04.813394      20 util.go:64] FATAL ERROR: error setting up chroot: error remounting chroot in read-only: permission denied
error setting up chroot: error remounting chroot in read-only: permission denied
D0725 14:19:04.814640       7 sandbox.go:1278] Destroying sandbox "dangerzone"
D0725 14:19:04.814726       7 sandbox.go:1287] Killing sandbox "dangerzone"
D0725 14:19:04.814809       7 container.go:776] Destroy container, cid: dangerzone
D0725 14:19:04.814847       7 container.go:1087] Killing gofer for container, cid: dangerzone, PID: 15
W0725 14:19:04.814928       7 util.go:64] FATAL ERROR: running container: creating container: cannot create sandbox: cannot read client sync file: waiting for sandbox to start: EOF
running container: creating container: cannot create sandbox: cannot read client sync file: waiting for sandbox to start: EOF
W0725 14:19:04.815004       7 main.go:227] Failure to execute command, err: 1
gVisor quit with exit code: 128

SELinux denial taken from audit logs

Jul 25 17:51:56 10.0.2.15 audit[8164]: AVC avc: denied { mounton } for pid=8164 comm="exe" path="/proc/fs" dev="proc" ino=4026531847 scontext=system_u:system_r:container_t:s0:c139,c177 tcontext=system_u:object_r:proc_t:s0 tclass=dir permissive=0

Workarounds

We are currently aware of the following two workarounds:

  1. Set SELinux to permissive mode: sudo setenforce Permissive
  2. Disable SELinux labeling for the spawned Podman container with --security-opt label=disable (requires changes to Dangerzone code).
@apyrgio apyrgio added bug Something isn't working P:linux container security labels Jul 26, 2024
@apyrgio apyrgio added this to the 0.8.0 milestone Jul 26, 2024
@apyrgio
Copy link
Contributor Author

apyrgio commented Jul 26, 2024

Proper solution

The workarounds we've mentioned above work, but each has their caveats:

  1. Setting SELinux to permissive undermines the security of the whole system, just to run Dangerzone. Also, it requires a user action. For these two reasons, we don't plan to use it.
  2. Using --security-opt label=disable disables SELinux protections for both the outer Podman container and inner gVisor container. Thankfully, gVisor adds its own, non-SELinux protections, but it would be nice to have the best of both worlds.

Let's try to find an alternative.

gVisor + SELinux

First of all, let's see what's the relationship between gVisor and SELinux.

From the get go, we see that any attempt to set an SELinux flag at the gVisor level will fail (possibly even label=disable edited: this particular option works):

	if len(spec.Process.SelinuxLabel) != 0 {
		return fmt.Errorf("SELinux is not supported: %s", spec.Process.SelinuxLabel)
	}

(from https://github.com/google/gvisor/blob/bbbecc35cc1ef0d2ec82db91a8178be958f3b783/runsc/specutils/specutils.go#L117-L119)

Futhermore, we see the following statement in google/gvisor#8957 (comment), where the original poster is bitten by a similar issue as ours:

As SELinux is not supported by gVisor, it's expected that it will crash on startup. While gVisor does check the runtime spec to see if SELinux is enabled, it does not check the system during runtime.

So, it doesn't seem we have too much leeway here.

Further exploration

We tried to see if we could label the outer container process with a more relaxed type label, that applies to proc_t filetypes. To do so, we listed the SELinux policies that allow the mounton action on proc_t filetypes, and filtered by those that apply to containers:

$ sesearch -t proc_t -p mounton -A | grep container_
allow container_engine_t filesystem_type:dir { mounton write };
allow container_engine_t filesystem_type:file mounton;
allow container_init_t proc_t:dir mounton;
allow container_kvm_t proc_t:dir mounton;
allow container_userns_t proc_t:dir mounton;

The original Podman container is labeled with container_t, which is not present in this list. Running the Podman container with container_engine_t type (i.e., with --security-opt label=type:container_engine_t) seems to do the job 🎉.

Note

Our cue for trying this is a similar workaround we saw in gVisor's issue tracker: google/gvisor#8747 (comment)

Is container_engine_t suitable for this use-case though? We looked into the file where it's defined (see container.te and the container_engine_t extra permissions), and saw the following:

container_engine_t is for running a container engine within a container

This is spot-on the thing we're doing here, running gVisor's runsc container engine within Podman's crun. This looks like a good match, and adds some assurances that things won't break in the future.

Future work

We could slim down the container_engine_t label further, by providing a Type Enforcement file (TE) that extends the container_t label with the bare minimum permissions to make gVisor work. There's prior work on how to do that (see "K3S SELinux" blog post), and it doesn't look too hard. However, this would have to happen in coordination with the gVisor team, to ensure that it doesn't break when something is added in runsc.

apyrgio added a commit that referenced this issue Jul 26, 2024
Set the `container_engine_t` SELinux on the **outer** Podman container,
so that gVisor does not break on systems where SELinux is enforcing.
This label is provided for container engines running within a container,
which fits our `runsc` within `crun` situation.

We have considered using the more permissive `label=disable` option, to
disable SELinux labels altogether, but we want to take advantage of as
many SELinux protections as we can, even for the **outer** container.

Fixes #880
apyrgio added a commit to apyrgio/gvisor that referenced this issue Jul 29, 2024
Mention two SELinux-related errors that we have seen in Fedora systems,
explain how can the user verify that they trigger them, and provide a
workaround.

The first error that we mention will be triggered by every user running
gVisor in a Fedora system, so it's good to have a solution for them. The
second error was found during the development of Dangerzone, which uses
a nested gVisor container within a rootless Podman container. This
use case is niche, but it still may help people who are experimenting
with gVisor within another container.

Refs freedomofpress/dangerzone#880
apyrgio added a commit to apyrgio/gvisor that referenced this issue Jul 29, 2024
Mention two SELinux-related errors that we have seen in Fedora systems,
explain how can the user verify that they trigger them, and provide a
workaround.

The first error that we mention will be triggered by every user running
gVisor in a Fedora system, so it's good to have a solution for them. The
second error was found during the development of Dangerzone, which uses
a nested gVisor container within a rootless Podman container. This
use case is niche, but it still may help people who are experimenting
with gVisor within another container.

Refs freedomofpress/dangerzone#880
@EtiennePerot
Copy link
Contributor

Wish I had seen this over the weekend. Glad it's figured out regardless. Yes, container_engine_t does sound like the right thing to use here.

copybara-service bot pushed a commit to google/gvisor that referenced this issue Jul 29, 2024
Mention two SELinux-related errors that we have seen in Fedora systems, explain how can the user verify that they trigger them, and provide a workaround.

The first error that we mention will be triggered by every user running gVisor in a Fedora system, so it's good to have a solution for them. The second error was found during the development of Dangerzone, which uses a nested gVisor container within a rootless Podman container. This use case is niche, but it still may help people who are experimenting with gVisor within another container.

Refs freedomofpress/dangerzone#880

FUTURE_COPYBARA_INTEGRATE_REVIEW=#10699 from apyrgio:master ebcb60f
PiperOrigin-RevId: 657324569
copybara-service bot pushed a commit to google/gvisor that referenced this issue Jul 29, 2024
Mention two SELinux-related errors that we have seen in Fedora systems, explain how can the user verify that they trigger them, and provide a workaround.

The first error that we mention will be triggered by every user running gVisor in a Fedora system, so it's good to have a solution for them. The second error was found during the development of Dangerzone, which uses a nested gVisor container within a rootless Podman container. This use case is niche, but it still may help people who are experimenting with gVisor within another container.

Refs freedomofpress/dangerzone#880

FUTURE_COPYBARA_INTEGRATE_REVIEW=#10699 from apyrgio:master ebcb60f
PiperOrigin-RevId: 657324569
@apyrgio
Copy link
Contributor Author

apyrgio commented Jul 29, 2024

Thanks for taking a look Etienne, it's good to know it makes sense to you as well.

copybara-service bot pushed a commit to google/gvisor that referenced this issue Jul 29, 2024
Mention two SELinux-related errors that we have seen in Fedora systems, explain how can the user verify that they trigger them, and provide a workaround.

The first error that we mention will be triggered by every user running gVisor in a Fedora system, so it's good to have a solution for them. The second error was found during the development of Dangerzone, which uses a nested gVisor container within a rootless Podman container. This use case is niche, but it still may help people who are experimenting with gVisor within another container.

Refs freedomofpress/dangerzone#880

FUTURE_COPYBARA_INTEGRATE_REVIEW=#10699 from apyrgio:master ebcb60f
PiperOrigin-RevId: 657324569
apyrgio added a commit that referenced this issue Jul 30, 2024
Set the `container_engine_t` SELinux on the **outer** Podman container,
so that gVisor does not break on systems where SELinux is enforcing.
This label is provided for container engines running within a container,
which fits our `runsc` within `crun` situation.

We have considered using the more permissive `label=disable` option, to
disable SELinux labels altogether, but we want to take advantage of as
many SELinux protections as we can, even for the **outer** container.

Cherry-picked from e1e63d1

Fixes #880
apyrgio added a commit to apyrgio/yum-tools-prod that referenced this issue Jul 30, 2024
We have stumbled into an SELinux issue that is present only on Fedora
systems (not Qubes ones). In order to fix this issue, we need to label
our Podman containers with the `container_engine_t` SELinux label.

Since this patch strictly affects Fedora systems, and it's just a
one-line change, we opt to do the following:
* Chery-pick this fix from the main branch into a hotfix-0.7.0 branch.
* Bump the patch level of the dangerzone-0.7.0 RPMs from 1 to 2, so
  that existing installations can be updated.

In this commit, we include just the necessary RPMs to achieve the
above.

Refs freedomofpress/dangerzone#880
@apyrgio
Copy link
Contributor Author

apyrgio commented Jul 30, 2024

Final note on the release plan for this patch. Because it strictly affects Fedora systems, and the change is a one-liner, we don't plan to release a new Dangerzone version (e.g., 0.7.1). Instead, we've decided to do the following:

  • Cherry-pick this fix from the main branch into a hotfix-0.7.0 branch.
  • Bump the patch level of the RPM spec from 1 to 2 in the hotfix branch, so that existing installations can be updated.
  • Build the new Fedora RPMs, and test that they work in a Fedora system with SELinux in enforcing mode.
  • Sign and publish the new Fedora RPMs in our yum-tools-prod repo.

This way, the next time Fedora users perform a dnf update, they will get a working Dangerzone version.

apyrgio added a commit to apyrgio/yum-tools-prod that referenced this issue Jul 30, 2024
We have stumbled into an SELinux issue that is present only on Fedora
systems (not Qubes ones). In order to fix this issue, we need to label
our Podman containers with the `container_engine_t` SELinux label.

Since this patch strictly affects Fedora systems, and it's just a
one-line change, we opt to do the following:
* Cherry-pick this fix from the main branch into a hotfix-0.7.0 branch.
* Bump the patch level of the dangerzone-0.7.0 RPMs from 1 to 2, so
  that existing installations can be updated.

In this commit, we include just the necessary RPMs to achieve the
above.

Refs freedomofpress/dangerzone#880
apyrgio added a commit that referenced this issue Jul 30, 2024
Update our `env.py` script to auto-detect the correct Dangerzone package
name. This is useful when building an end-user environment, i.e., a
container image where we copy the respective Dangerzone .deb/.rpm
package and install it via a package manager.

To achieve this, we replace the hardcoded patch level (`-1`) in the
package name with a glob character (`*`). Then, we check in the
respective build directory if there's exactly one match for this
pattern. If yes, we return the full path. If not, we raise an exception.

Note that this limitation was triggered when we were building RPM
packages for the 0.7.0 hotfix release.

Refs #880
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

2 participants