Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

document how to edit/set kernel arguments #88

Closed
dustymabe opened this issue Jun 24, 2020 · 11 comments · Fixed by #199
Closed

document how to edit/set kernel arguments #88

dustymabe opened this issue Jun 24, 2020 · 11 comments · Fixed by #199
Assignees

Comments

@dustymabe
Copy link
Member

We have some items in the works that will make settting/editing kargs easier but for now let's just get up a page that is a starting point for people to use to configure kernel arguments persistently.

I'm thinking it should be at the same place in the navigation as https://docs.fedoraproject.org/en-US/fedora-coreos/sysctl/ but for kargs.

@jdoss
Copy link
Contributor

jdoss commented Jun 30, 2020

@jlebon per our conversation on IRC, this seems to work for enabling cgroups v2 on FCOS:

systemd:
  units:
  - name: enable-cgroups-v2.service
    enabled: true
    contents: |
      [Unit]
      Description=Enable cgroups v2 (systemd.unified_cgroup_hierarchy)
      ConditionFirstBoot=true
      Wants=basic.target
      Before=multi-user.target mycool-podman-pod.service

      [Service]
      Type=oneshot
      ExecStart=/usr/bin/rpm-ostree kargs --delete systemd.unified_cgroup_hierarchy=0 --reboot

      [Install]
      WantedBy=basic.target

I think this unit can be cleaned up more. An edge case I found was where this unit would be called and then before the system could reboot, my unit that sets up my podman pod, would start and it would cause podman to be configured to use cgroups v1. After the system came back up in cgroups v2 mode podman would fail to start my containers via systemd.

A work around to that was adding mycool-podman-pod.service to Before. Maybe there is a better way to block systemd from continuing while it starts the reboot process? Having a generic drop in service for FCOS that doesn't have to be modified by an end user seems worth figuring out.

@cgwalters
Copy link
Member

I think we should extend FCOS (or potentially Ignition) to have a standard target that (if enabled) reboots and that other units can be ordered against.

In OpenShift we kind of hack this together by having the MCO inject systemd units that perform an OS upgrade+reboot and are Before=kubelet.service, but this is a generic problem.

Something like ignition-user-complete-reboot.target - runs in the real root. Users would then order units that require "real root configuration" (like kubelet.service) to be After= that generic service.

Another way to look at this is extending the concept of Ignition as "runs at most once configuration" to the real root. One can do this now, but having users invent "run at most once" semantics + reboot handling makes things more likely to clash.

@bgilbert
Copy link
Contributor

In general, rebooting from a unit in the real root isn't safe, because ConditionFirstBoot=true units that sequence after that unit will never run. I don't see a way around that other than clearing machine-id (ugh) or handling the reboot from the initramfs.

@cgwalters
Copy link
Member

Yes, we need to teach people to stop using ConditionFirstBoot basically. Instead, our target will write its own stamp file like /var/lib/ignition-user-complete.stamp and use ConditionPathExists=!/var/lib/ignition-user-complete.stamp.

If we want to handle being interrupted during provisioning, then it's required that services be idempotent.

See e.g. openshift/machine-config-operator#1762

@jdoss
Copy link
Contributor

jdoss commented Jun 30, 2020

@cgwalters and @bgilbert outside of future things that would be more ideal for setting things like this, would there be any improvements to my current example for kickin FCOS into cgroup v2?

@cgwalters
Copy link
Member

@jdoss First, thanks for publishing that example!

But...I can't come up with easy "minor" changes to it to solve the problems you mentioned without really trying to tackle the general space.

For example, the unit ordering one...well, we could recommend ordering it Before=basic.target rather than after - that would naturally inhibit all services (such as your podman units) that default to starting After=basic.target. That would work for changing kernel arguments.

But...the OpenShift use case wants to apply OS updates before any potentially untrusted containers land, and doing OS updates requires things like networking, time synchronization etc. And those are often After=basic.target...so it gets into a problem domain that quickly generalizes into definining an explicit provisioning target and which services do and don't run in it.

@jlebon
Copy link
Member

jlebon commented Jun 30, 2020

In general, rebooting from a unit in the real root isn't safe, because ConditionFirstBoot=true units that sequence after that unit will never run. I don't see a way around that other than clearing machine-id (ugh) or handling the reboot from the initramfs.

Ouhh, that's a good point. Given that (most) computers do eventually reboot, and there's no way to order units "after all other units", doesn't that imply that ConditionFirstBoot= is fundamentally broken? Might be worth discussing this with upstream. Offhand, a search for ConditionFirstBoot + reboot there doesn't yield anything about that.

@cgwalters and @bgilbert outside of future things that would be more ideal for setting things like this, would there be any improvements to my current example for kickin FCOS into cgroup v2?

The idea is that instead of a ConditionFirstBoot=, you'd use your own stamp file, or make sure that your service is idempotent. In this case, you should be able to simply replace ConditionFirstBoot=true with ConditionKernelCommandLine=systemd.unified_cgroup_hierarchy=0. However, I'd still do at least e.g. After=multi-user.target to try to mitigate the issue mentioned above for services that still use ConditionFirstBoot=.

@lucab
Copy link
Contributor

lucab commented Jul 2, 2020

Cross-referencing: coreos/butane#57

@jdoss
Copy link
Contributor

jdoss commented Jul 2, 2020

@cgwalters no problem! Also after rereading my reply it sounded a bit terse and that was not my intent. I was pretty sure my first pass at this wasn't going to be perfect. I know you, @jlebon, and @bgilbert have a lot more inner FCOS guts understanding that could make this better. Thanks for taking the time to respond 😄

My initial testing of the original unit worked fine on my qemu FCOS tester VM, but when trying it out on EC2 with having it run Before=basic.target it cut off ignition from fully finishing. My ignition is too big to fit in cloud-init, so I have to download it from S3 and replace... maybe that has something to do with it.

Anyways, I ended up with this which seems to work for now:

systemd:
  units:
  - name: enable-cgroups-v2.service
    enabled: true
    contents: |
      [Unit]
      Description=Enable cgroups v2 (systemd.unified_cgroup_hierarchy)
      ConditionFirstBoot=true
      After=ignition-complete.target
      Before=default.target

      [Service]
      Type=oneshot
      ExecStart=/usr/bin/rpm-ostree kargs --replace systemd.unified_cgroup_hierarchy=1 --reboot

      [Install]
      WantedBy=basic.target

Note that ExecStart=/usr/bin/rpm-ostree kargs --replace systemd.unified_cgroup_hierarchy=1 --reboot been changed above. To truly boot FCOS into cgroups v2 you need to do this instead. Otherwise it ends up in a the cgroup v1/v2 hybrid mode:

[core@mycool-fcos ~]$ sudo /usr/bin/rpm-ostree kargs --delete systemd.unified_cgroup_hierarchy=0 
Staging deployment... done
Kernel arguments updated.
Run "systemctl reboot" to start a reboot

[core@mycool-fcos ~]$ sudo systemctl reboot
*reboot*

[core@mycool-fcos ~]$ mount | grep cgroup
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,seclabel,mode=755)
cgroup2 on /sys/fs/cgroup/unified type cgroup2 (rw,nosuid,nodev,noexec,relatime,seclabel,nsdelegate)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,xattr,name=systemd)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,cpu,cpuacct)
cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,hugetlb)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,cpuset)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,memory)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,devices)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,blkio)
cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,net_cls,net_prio)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,perf_event)


[core@mycool-fcos ~]$ sudo /usr/bin/rpm-ostree kargs --replace systemd.unified_cgroup_hierarchy=1 
Staging deployment... done
Kernel arguments updated.
Run "systemctl reboot" to start a reboot

[core@mycool-fcos ~]$ sudo systemctl reboot
*reboot*

[core@mycool-fcos ~]$ mount | grep cgroup
cgroup2 on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime,seclabel,nsdelegate)

Then all of my units also needed After=network-online.target enable-cgroups-v2.service added because with further testing, some units would still start in the middle of the reboot process which can lead to not great things.

@jlebon
Copy link
Member

jlebon commented Oct 15, 2020

Re. ConditionFirstBoot, see discussions in systemd/systemd#4511

jlebon added a commit to jlebon/fedora-coreos-docs that referenced this issue Oct 23, 2020
@jlebon
Copy link
Member

jlebon commented Oct 23, 2020

Opened #199 for this which includes feedback from the discussions in that systemd ticket.

jlebon added a commit to jlebon/fedora-coreos-docs that referenced this issue Oct 26, 2020
jlebon added a commit to jlebon/fedora-coreos-docs that referenced this issue Oct 26, 2020
jlebon added a commit that referenced this issue Oct 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants