Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

switchroot: Stop making /sysroot mount private #3292

Merged
merged 2 commits into from
Sep 6, 2024

Conversation

dbnicholson
Copy link
Member

Back in 2b8d586, /sysroot was changed to be a private mount so that
submounts of /var do not propagate back to the stateroot /var. That's
laudible, but it makes /sysroot different than every other shared mount
in the root namespace. In particular, it means that submounts of
/sysroot do not propagate into separate mount namespaces.

Rather than make /sysroot private, make /var a slave+shared mount so
that it receives mount events from /sysroot but not vice versa. That
achieves the same effect of preventing /var submount events from
propagating back to /sysroot while allowing /sysroot mount events to
propagate forward like every other system mount.

The mount propagation flags are applied as options in the generated
var.mount unit. This depends on a mount(8) feature that has been present
since util-linux 2.23. That's available in RHEL 7 and every non-EOL
Debian and Ubuntu release. Applying the propagation from var.mount fixes
a small race, too. Previously, if a /var submount was added before
/sysroot was made private, it would have propagated back into /sysroot.
That was possible since ostree-remount.service orders itself after
var.mount but not before any /var submounts.

Fixes: #2086


I included the added test separately in the first commit so you can see the difference in behavior.

@github-actions github-actions bot added the area/prepare-root Issue relates to ostree-prepare-root label Aug 30, 2024
@dbnicholson
Copy link
Member Author

Here's a comparison of mount propagation before and after this change using a cosa built VM. I mounted a tmpfs on /var/foo and /sysroot/boot to compare how those would propagate.

Current:

[core@cosa-devsh ~]$ findmnt -l -o TARGET,PROPAGATION,OPT-FIELDS
TARGET                                   PROPAGATION OPT-FIELDS
/sysroot                                 private     
/                                        shared      shared:1
/etc                                     shared      shared:2
/usr                                     shared      shared:3
/sysroot/ostree/deploy/fedora-coreos/var private     
/dev                                     shared      shared:6
/dev/shm                                 shared      shared:7
/dev/pts                                 shared      shared:8
/sys                                     shared      shared:9
/sys/kernel/security                     shared      shared:10
/sys/fs/cgroup                           shared      shared:11
/sys/fs/pstore                           shared      shared:12
/sys/fs/bpf                              shared      shared:13
/sys/kernel/config                       shared      shared:14
/proc                                    shared      shared:16
/run                                     shared      shared:17
/sys/fs/selinux                          shared      shared:15
/proc/sys/fs/binfmt_misc                 shared      shared:18
/dev/hugepages                           shared      shared:19
/dev/mqueue                              shared      shared:20
/sys/kernel/debug                        shared      shared:21
/sys/kernel/tracing                      shared      shared:22
/tmp                                     shared      shared:23
/sys/fs/fuse/connections                 shared      shared:24
/var                                     shared      shared:5
/boot                                    shared      shared:84
/var/mnt/workdir-tmp                     shared      shared:27
/var/mnt/workdir                         shared      shared:91
/var/lib/nfs/rpc_pipefs                  shared      shared:95
/run/user/1000                           shared      shared:401
/var/foo                                 shared      shared:362
/sysroot/boot                            private     

With this PR:

[core@cosa-devsh ~]$ findmnt -l -o TARGET,PROPAGATION,OPT-FIELDS
TARGET                                   PROPAGATION  OPT-FIELDS
/sysroot                                 shared       shared:4
/                                        shared       shared:1
/etc                                     shared       shared:2
/usr                                     shared       shared:3
/sysroot/ostree/deploy/fedora-coreos/var shared       shared:5
/dev                                     shared       shared:6
/dev/shm                                 shared       shared:7
/dev/pts                                 shared       shared:8
/sys                                     shared       shared:9
/sys/kernel/security                     shared       shared:10
/sys/fs/cgroup                           shared       shared:11
/sys/fs/pstore                           shared       shared:12
/sys/fs/bpf                              shared       shared:13
/sys/kernel/config                       shared       shared:14
/proc                                    shared       shared:16
/run                                     shared       shared:17
/sys/fs/selinux                          shared       shared:15
/proc/sys/fs/binfmt_misc                 shared       shared:18
/dev/hugepages                           shared       shared:19
/dev/mqueue                              shared       shared:20
/sys/kernel/debug                        shared       shared:21
/sys/kernel/tracing                      shared       shared:22
/tmp                                     shared       shared:23
/sys/fs/fuse/connections                 shared       shared:24
/var                                     shared,slave shared:58 master:5
/boot                                    shared       shared:86
/var/lib/nfs/rpc_pipefs                  shared       shared:163
/run/user/1000                           shared       shared:379
/var/foo                                 shared       shared:432
/sysroot/boot                            shared       shared:442

This tests the current behavior of making /sysroot a private mount so
that submounts on /var do not propagate back to /sysroot. It also shows
how submounts of /sysroot do not propagate into separate mount
namespaces for the same reason.
Back in 2b8d586, /sysroot was changed to be a private mount so that
submounts of /var do not propagate back to the stateroot /var. That's
laudible, but it makes /sysroot different than every other shared mount
in the root namespace. In particular, it means that submounts of
/sysroot do not propagate into separate mount namespaces.

Rather than make /sysroot private, make /var a slave+shared mount so
that it receives mount events from /sysroot but not vice versa. That
achieves the same effect of preventing /var submount events from
propagating back to /sysroot while allowing /sysroot mount events to
propagate forward like every other system mount. See
mount_namespaces(7)[1] and the linux shared subtrees[2] documentation
for details on slave+shared mount propagation.

When /var is mounted in the initramfs, this is accomplished with
mount(2) syscalls. When /var is mounted after switching to the real
root, the mount propagation flags are applied as options in the
generated var.mount unit. This depends on a mount(8) feature that has
been present since util-linux 2.23. That's available in RHEL 7 and every
non-EOL Debian and Ubuntu release. Applying the propagation from
var.mount fixes a small race, too. Previously, if a /var submount was
added before /sysroot was made private, it would have propagated back
into /sysroot. That was possible since ostree-remount.service orders
itself after var.mount but not before any /var submounts.

1. https://man7.org/linux/man-pages/man7/mount_namespaces.7.html
2. https://docs.kernel.org/filesystems/sharedsubtree.html

Fixes: ostreedev#2086
Copy link
Member

@cgwalters cgwalters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I need some quality time with this, but overall makes sense to me enough that I think it's better to merge and give this some soak time in git main, for the very few people that run from git main (happens to include my workstation though).

Thanks for this!

src/switchroot/ostree-remount.c Show resolved Hide resolved
@cgwalters cgwalters enabled auto-merge September 6, 2024 22:51
@cgwalters cgwalters merged commit 413b0ad into ostreedev:main Sep 6, 2024
24 of 25 checks passed
@dbnicholson dbnicholson deleted the var-slave-shared branch September 7, 2024 16:10
wjt added a commit to endlessm/eos-boot-helper that referenced this pull request Sep 9, 2024
…mespace"

This reverts commit 28e58e8.

The underlying bug in ostree was fixed in
endlessm/ostree#214, a backport of
ostreedev/ostree#3292.

With that change, eos-test-mode's overmounted overlayfses propagate
correctly to NetworkManager and AccountsService, so they no longer need
to be restarted. And restarting NetworkManager triggers a bug in Initial
Setup, which does not correctly handle this case and so never lists
available Wi-Fi networks.

https://phabricator.endlessm.com/T35640
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/prepare-root Issue relates to ostree-prepare-root
Projects
None yet
Development

Successfully merging this pull request may close these issues.

/sysroot private mount and /home -> /sysroot/home
2 participants