document how to edit/set kernel arguments #88

dustymabe · 2020-06-24T19:28:02Z

We have some items in the works that will make settting/editing kargs easier but for now let's just get up a page that is a starting point for people to use to configure kernel arguments persistently.

I'm thinking it should be at the same place in the navigation as https://docs.fedoraproject.org/en-US/fedora-coreos/sysctl/ but for kargs.

jdoss · 2020-06-30T17:22:32Z

@jlebon per our conversation on IRC, this seems to work for enabling cgroups v2 on FCOS:

systemd:
  units:
  - name: enable-cgroups-v2.service
    enabled: true
    contents: |
      [Unit]
      Description=Enable cgroups v2 (systemd.unified_cgroup_hierarchy)
      ConditionFirstBoot=true
      Wants=basic.target
      Before=multi-user.target mycool-podman-pod.service

      [Service]
      Type=oneshot
      ExecStart=/usr/bin/rpm-ostree kargs --delete systemd.unified_cgroup_hierarchy=0 --reboot

      [Install]
      WantedBy=basic.target

I think this unit can be cleaned up more. An edge case I found was where this unit would be called and then before the system could reboot, my unit that sets up my podman pod, would start and it would cause podman to be configured to use cgroups v1. After the system came back up in cgroups v2 mode podman would fail to start my containers via systemd.

A work around to that was adding mycool-podman-pod.service to Before. Maybe there is a better way to block systemd from continuing while it starts the reboot process? Having a generic drop in service for FCOS that doesn't have to be modified by an end user seems worth figuring out.

cgwalters · 2020-06-30T17:52:41Z

I think we should extend FCOS (or potentially Ignition) to have a standard target that (if enabled) reboots and that other units can be ordered against.

In OpenShift we kind of hack this together by having the MCO inject systemd units that perform an OS upgrade+reboot and are Before=kubelet.service, but this is a generic problem.

Something like ignition-user-complete-reboot.target - runs in the real root. Users would then order units that require "real root configuration" (like kubelet.service) to be After= that generic service.

Another way to look at this is extending the concept of Ignition as "runs at most once configuration" to the real root. One can do this now, but having users invent "run at most once" semantics + reboot handling makes things more likely to clash.

bgilbert · 2020-06-30T19:53:57Z

In general, rebooting from a unit in the real root isn't safe, because ConditionFirstBoot=true units that sequence after that unit will never run. I don't see a way around that other than clearing machine-id (ugh) or handling the reboot from the initramfs.

cgwalters · 2020-06-30T20:01:09Z

Yes, we need to teach people to stop using ConditionFirstBoot basically. Instead, our target will write its own stamp file like /var/lib/ignition-user-complete.stamp and use ConditionPathExists=!/var/lib/ignition-user-complete.stamp.

If we want to handle being interrupted during provisioning, then it's required that services be idempotent.

See e.g. openshift/machine-config-operator#1762

jdoss · 2020-06-30T20:23:06Z

@cgwalters and @bgilbert outside of future things that would be more ideal for setting things like this, would there be any improvements to my current example for kickin FCOS into cgroup v2?

cgwalters · 2020-06-30T20:37:28Z

@jdoss First, thanks for publishing that example!

But...I can't come up with easy "minor" changes to it to solve the problems you mentioned without really trying to tackle the general space.

For example, the unit ordering one...well, we could recommend ordering it Before=basic.target rather than after - that would naturally inhibit all services (such as your podman units) that default to starting After=basic.target. That would work for changing kernel arguments.

But...the OpenShift use case wants to apply OS updates before any potentially untrusted containers land, and doing OS updates requires things like networking, time synchronization etc. And those are often After=basic.target...so it gets into a problem domain that quickly generalizes into definining an explicit provisioning target and which services do and don't run in it.

jlebon · 2020-06-30T20:37:29Z

In general, rebooting from a unit in the real root isn't safe, because ConditionFirstBoot=true units that sequence after that unit will never run. I don't see a way around that other than clearing machine-id (ugh) or handling the reboot from the initramfs.

Ouhh, that's a good point. Given that (most) computers do eventually reboot, and there's no way to order units "after all other units", doesn't that imply that ConditionFirstBoot= is fundamentally broken? Might be worth discussing this with upstream. Offhand, a search for ConditionFirstBoot + reboot there doesn't yield anything about that.

@cgwalters and @bgilbert outside of future things that would be more ideal for setting things like this, would there be any improvements to my current example for kickin FCOS into cgroup v2?

The idea is that instead of a ConditionFirstBoot=, you'd use your own stamp file, or make sure that your service is idempotent. In this case, you should be able to simply replace ConditionFirstBoot=true with ConditionKernelCommandLine=systemd.unified_cgroup_hierarchy=0. However, I'd still do at least e.g. After=multi-user.target to try to mitigate the issue mentioned above for services that still use ConditionFirstBoot=.

lucab · 2020-07-02T09:29:15Z

Cross-referencing: coreos/butane#57

jdoss · 2020-07-02T15:18:11Z

@cgwalters no problem! Also after rereading my reply it sounded a bit terse and that was not my intent. I was pretty sure my first pass at this wasn't going to be perfect. I know you, @jlebon, and @bgilbert have a lot more inner FCOS guts understanding that could make this better. Thanks for taking the time to respond 😄

My initial testing of the original unit worked fine on my qemu FCOS tester VM, but when trying it out on EC2 with having it run Before=basic.target it cut off ignition from fully finishing. My ignition is too big to fit in cloud-init, so I have to download it from S3 and replace... maybe that has something to do with it.

Anyways, I ended up with this which seems to work for now:

systemd:
  units:
  - name: enable-cgroups-v2.service
    enabled: true
    contents: |
      [Unit]
      Description=Enable cgroups v2 (systemd.unified_cgroup_hierarchy)
      ConditionFirstBoot=true
      After=ignition-complete.target
      Before=default.target

      [Service]
      Type=oneshot
      ExecStart=/usr/bin/rpm-ostree kargs --replace systemd.unified_cgroup_hierarchy=1 --reboot

      [Install]
      WantedBy=basic.target

Note that ExecStart=/usr/bin/rpm-ostree kargs --replace systemd.unified_cgroup_hierarchy=1 --reboot been changed above. To truly boot FCOS into cgroups v2 you need to do this instead. Otherwise it ends up in a the cgroup v1/v2 hybrid mode:

[core@mycool-fcos ~]$ sudo /usr/bin/rpm-ostree kargs --delete systemd.unified_cgroup_hierarchy=0 
Staging deployment... done
Kernel arguments updated.
Run "systemctl reboot" to start a reboot

[core@mycool-fcos ~]$ sudo systemctl reboot
*reboot*

[core@mycool-fcos ~]$ mount | grep cgroup
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,seclabel,mode=755)
cgroup2 on /sys/fs/cgroup/unified type cgroup2 (rw,nosuid,nodev,noexec,relatime,seclabel,nsdelegate)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,xattr,name=systemd)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,cpu,cpuacct)
cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,hugetlb)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,cpuset)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,memory)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,devices)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,blkio)
cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,net_cls,net_prio)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,perf_event)


[core@mycool-fcos ~]$ sudo /usr/bin/rpm-ostree kargs --replace systemd.unified_cgroup_hierarchy=1 
Staging deployment... done
Kernel arguments updated.
Run "systemctl reboot" to start a reboot

[core@mycool-fcos ~]$ sudo systemctl reboot
*reboot*

[core@mycool-fcos ~]$ mount | grep cgroup
cgroup2 on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime,seclabel,nsdelegate)

Then all of my units also needed After=network-online.target enable-cgroups-v2.service added because with further testing, some units would still start in the middle of the reboot process which can lead to not great things.

jlebon · 2020-10-15T14:48:13Z

Re. ConditionFirstBoot, see discussions in systemd/systemd#4511

Closes: coreos#88

jlebon · 2020-10-23T21:22:49Z

Opened #199 for this which includes feedback from the discussions in that systemd ticket.

Closes: coreos#88

Closes: #88

dustymabe assigned jlebon Jun 24, 2020

dustymabe added the jira label Jun 24, 2020

jlebon added the docs hackfest label Aug 4, 2020

jlebon mentioned this issue Oct 15, 2020

ConditionFirstBoot is not power failure safe systemd/systemd#4511

Closed

2 tasks

jlebon added a commit to jlebon/fedora-coreos-docs that referenced this issue Oct 23, 2020

Document how to modify kargs via rpm-ostree

d36c70b

Closes: coreos#88

jlebon mentioned this issue Oct 23, 2020

Document how to modify kargs via rpm-ostree #199

Merged

jlebon added a commit to jlebon/fedora-coreos-docs that referenced this issue Oct 26, 2020

Document how to modify kargs via rpm-ostree

c64bef9

Closes: coreos#88

jlebon added a commit to jlebon/fedora-coreos-docs that referenced this issue Oct 26, 2020

Document how to modify kargs via rpm-ostree

4eb0f5c

Closes: coreos#88

jlebon closed this as completed in #199 Oct 26, 2020

jlebon added a commit that referenced this issue Oct 26, 2020

Document how to modify kargs via rpm-ostree

bec7e2f

Closes: #88

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

document how to edit/set kernel arguments #88

document how to edit/set kernel arguments #88

dustymabe commented Jun 24, 2020

jdoss commented Jun 30, 2020

cgwalters commented Jun 30, 2020

bgilbert commented Jun 30, 2020

cgwalters commented Jun 30, 2020

jdoss commented Jun 30, 2020

cgwalters commented Jun 30, 2020

jlebon commented Jun 30, 2020

lucab commented Jul 2, 2020

jdoss commented Jul 2, 2020

jlebon commented Oct 15, 2020

jlebon commented Oct 23, 2020

document how to edit/set kernel arguments #88

document how to edit/set kernel arguments #88

Comments

dustymabe commented Jun 24, 2020

jdoss commented Jun 30, 2020

cgwalters commented Jun 30, 2020

bgilbert commented Jun 30, 2020

cgwalters commented Jun 30, 2020

jdoss commented Jun 30, 2020

cgwalters commented Jun 30, 2020

jlebon commented Jun 30, 2020

lucab commented Jul 2, 2020

jdoss commented Jul 2, 2020

jlebon commented Oct 15, 2020

jlebon commented Oct 23, 2020