Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cgroups inheritance when using k0s in docker #4234

Open
4 tasks done
turdusmerula opened this issue Apr 3, 2024 · 15 comments · May be fixed by #5059
Open
4 tasks done

cgroups inheritance when using k0s in docker #4234

turdusmerula opened this issue Apr 3, 2024 · 15 comments · May be fixed by #5059
Labels
bug Something isn't working

Comments

@turdusmerula
Copy link

turdusmerula commented Apr 3, 2024

Before creating an issue, make sure you've checked the following:

  • You are running the latest released version of k0s
  • Make sure you've searched for existing issues, both open and closed
  • Make sure you've searched for PRs too, a fix might've been merged already
  • You're looking at docs for the released version, "main" branch docs are usually ahead of released versions.

Platform

Linux 6.5.0-26-generic #26~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Tue Mar 12 10:22:43 UTC 2 x86_64 GNU/Linux
NAME="Linux Mint"
VERSION="21.3 (Virginia)"
ID=linuxmint
ID_LIKE="ubuntu debian"
PRETTY_NAME="Linux Mint 21.3"
VERSION_ID="21.3"
HOME_URL="https://www.linuxmint.com/"
SUPPORT_URL="https://forums.linuxmint.com/"
BUG_REPORT_URL="http://linuxmint-troubleshooting-guide.readthedocs.io/en/latest/"
PRIVACY_POLICY_URL="https://www.linuxmint.com/"
VERSION_CODENAME=virginia
UBUNTU_CODENAME=jammy

Version

v1.29.2+k0s.0

Sysinfo

`k0s sysinfo`
Total memory: 62.5 GiB (pass)
Disk space available for /var/lib/k0s: 188.3 GiB (pass)
Name resolution: localhost: [127.0.0.1 ::1] (pass)
Operating system: Linux (pass)
  Linux kernel release: 6.5.0-26-generic (pass)
  Max. file descriptors per process: current: 1048576 / max: 1048576 (pass)
  AppArmor: unavailable (pass)
  Executable in PATH: modprobe: /sbin/modprobe (pass)
  Executable in PATH: mount: /bin/mount (pass)
  Executable in PATH: umount: /bin/umount (pass)
  /proc file system: mounted (0x9fa0) (pass)
  Control Groups: version 2 (pass)
    cgroup controller "cpu": available (is a listed root controller) (pass)
    cgroup controller "cpuacct": available (via cpu in version 2) (pass)
    cgroup controller "cpuset": available (is a listed root controller) (pass)
    cgroup controller "memory": available (is a listed root controller) (pass)
    cgroup controller "devices": available (device filters attachable) (pass)
    cgroup controller "freezer": available (cgroup.freeze exists) (pass)
    cgroup controller "pids": available (is a listed root controller) (pass)
    cgroup controller "hugetlb": available (is a listed root controller) (pass)
    cgroup controller "blkio": available (via io in version 2) (pass)
  CONFIG_CGROUPS: Control Group support: no kernel config found (warning)
  CONFIG_NAMESPACES: Namespaces support: no kernel config found (warning)
  CONFIG_NET: Networking support: no kernel config found (warning)
  CONFIG_EXT4_FS: The Extended 4 (ext4) filesystem: no kernel config found (warning)
  CONFIG_PROC_FS: /proc file system support: no kernel config found (warning)

What happened?

I use the k0sproject/k0s:v1.29.2-k0s.0 docker image to run k0s with the following command:

export n=1
docker run -d --privileged --name="test$n-k0s" --memory=4G --cgroupns="host" --cgroup-parent="test$n-k0s.slice" -v=/var/lib/k0s k0sproject/k0s:v1.29.2-k0s.0 k0s controller --enable-worker --no-taints

The goal is to be able to launch several instances in parallel, this works fine.

The problem I'm facing is with the cgroups. K0s runs correctly inside the container cgroup scope so the 4GB memory barrier works correctly. But if I look to the processes spawned by the containerd-shim they are launched in /kubepods so they are not constrained.

Screenshot at 2024-04-03 19-58-53

Is there a way to have the cgroup '/kubepods` created inside my container cgroup?
I don't quite know if it is a bug, a lack of configuration on my side or if it's a feature request, any help would be really helpful :)

Steps to reproduce

Expected behavior

No response

Actual behavior

No response

Screenshots and logs

No response

Additional context

No response

@turdusmerula turdusmerula added the bug Something isn't working label Apr 3, 2024
@twz123
Copy link
Member

twz123 commented Apr 8, 2024

Your observations are indeed correct. The current way the "k0s in Docker" docs are written are not optimized for running multiple workers on the same Docker host. In particular, the steps for cgroupsv2 weaken the isolation between the host and the k0s container quite a bit.

The culprit here is that certain things related to cgroups need to be in place for kubelet and the container runtime to be happy, such as a writable cgroup root filesystem with all the necessary controllers enabled. While this can be achieved with some shenanigans like a clever Docker container entrypoint script, k0s doesn't have that support right now. You can try to work around this by giving each k0s worker Docker container some different values for the various cgroup-related kubelet configuration options: Try adding these args to each of your k0s worker's kubelet extraArgs and experiment with the outcome:

--cgroup-root=/test.slice/test$n-k0s.slice
--kubelet-cgroups=/test.slice/test$n-k0s.slice/kubelet.slice
--runtime-cgroups=/test.slice/test$n-k0s.slice/containerd.slice

I took a stab at the Docker entrypoint script a few months ago, but haven't polished it up for a PR yet. That might provide some additional insight.

@turdusmerula
Copy link
Author

Thank you for this answers @twz123, this is pretty much what I managed to implement, however it kind of feels hacky.

I start by running the container waiting for its configuration file. During this operation docker will create the test$n.slice cgroup

docker run -it --name test$n -d --cgroupns=host --cgroup-parent=test$n.slice --hostname k0s --privileged -v /var/lib/k0s -v "/sys/fs/cgroup:/sys/fs/cgroup:rw" k0sproject/k0s:v1.29.2-k0s.0 bash -c 'while [[ ! -f /var/lib/k0s/config.yaml ]]; do sleep 1; done; k0s controller --enable-worker --no-taints --config /var/lib/k0s/config.yaml --profile=cgroup --enable-metrics-scraper'

While the container is waiting I then set the limits inside the cgroup (doing it with docker allows me to do it without sudo):

docker exec -it test$n bash -c "echo 6000M > /sys/fs/cgroup/test$n.slice/memory.max"
docker exec -it test$n bash -c "echo 5500M > /sys/fs/cgroup/test$n.slice/memory.high"
docker exec -it test$n bash -c "echo 0 > /sys/fs/cgroup/test$n.slice/memory.swap.max"
docker exec -it test$n bash -c "echo 0 > /sys/fs/cgroup/test$n.slice/memory.swap.high"
docker exec -it test$n bash -c "echo '0-4' > /sys/fs/cgroup/test$n.slice/cpuset.cpus"

I then construct the config.yaml and push it to the container:

apiVersion: k0s.k0sproject.io/v1beta1
kind: ClusterConfig
metadata:
  name: k0s
spec:
  api:
    extraArgs:
      # allow the cluster to expose on localhost
      service-node-port-range: 80-32767
  telemetry:
    enabled: false

  workerProfiles:
  # https://kubernetes.io/docs/reference/config-api/kubelet-config.v1beta1/
  # https://github.com/k0sproject/k0s/blob/main/docs/configuration.md
  - name: cgroup
    values:
      cgroupRoot: test$n.slice
      systemCgroups: test$n.slice
      kubeletCgroups: test$n.slice
docker cp config.yaml test$n:/var/lib/k0s/config.yaml

Processes are now in the correct cgroup (instead ot kubelet):
Screenshot at 2024-04-09 09-56-13

Memory limits and cpu works, I can see that if I constrain too tight the cluster it won't start and won't swap as I was expecting.
The limit of this approach which is still unsolved for now is that the oom killer does not work. I suspect that having the kubelet outside of the cgroup is the reason why, it's not aware of the memory limits by now.
The stranger part is that the kernel oom killer does not work either, when my cluster saturates it's memory it goes in a strange state where it loads it's cpus without crashing any process, I still have to investigate on this point.

@twz123
Copy link
Member

twz123 commented Apr 9, 2024

Thanks for experimenting and sharing the results @turdusmerula! For historic reasons, k0s will disregard the kubeletCgroups field in the worker profile. This is probably something that should be fixed. However, It should work if you useit as a kubelet argument via k0s worker --kubelet-extra-args=--kubelet-cgroups=/test.slice/test$n-k0s.slice/kubelet.slice.

@turdusmerula
Copy link
Author

Kubelet tells me the --kubelet-cgroup is deprecated when I dig into its help:

      --kubelet-cgroups string                                   Optional absolute name of cgroups to create and run the Kubelet in. (DEPRECATED: This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.)

That's why I used the kubeletCgroups field, do you confirm I should not set it through the profile?

@twz123
Copy link
Member

twz123 commented Apr 9, 2024

do you confirm I should not set it through the profile?

Yes, the flags are deprecated, but k0s will currently ignore the kubeletCgroups field in the worker profile, of all things. Until this is fixed, the deprecated kubelet flag is a backdoor.

@turdusmerula
Copy link
Author

turdusmerula commented Apr 9, 2024

Have you already managed to use the --kubelet-extra-args parameter ?
No matter what I try to pass, even junk parameters, it does not seem to be passed to the kubelet at the end, I find no trace of kubelet evaluating any extra arg in the container logs.

@turdusmerula
Copy link
Author

turdusmerula commented Apr 10, 2024

Everything is working now, I had been quite unlucky. I choose to override the kubelet configuration by passing it in a file called /var/lib/k0s/kubelet-config.yaml and took me a while to figure out that this path was already chosen by k0s to generate the config it passes to kubelet so my file was replaced and had no effect.

However I confirm that passing parameters through --kubelet-extra-args has no effect as they are overwritten by the kubelet-config.yaml generated by k0s.
The only way I could overcome this was by setting --kubelet-extra-args=--config=/var/lib/k0s/kubelet-ext-config.yaml and passing in it my config:

apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration

# default values coming from /var/lib/k0s/kubelet-config.yaml and created by k0s
authentication:
  anonymous: {}
  webhook:
    cacheTTL: 0s
  x509:
    clientCAFile: /var/lib/k0s/pki/ca.crt
authorization:
  webhook:
    cacheAuthorizedTTL: 0s
    cacheUnauthorizedTTL: 0s
cgroupsPerQOS: true
clusterDNS:
- 10.96.0.10
clusterDomain: cluster.local
containerRuntimeEndpoint: unix:///run/k0s/containerd.sock
cpuManagerReconcilePeriod: 0s
eventRecordQPS: 0
evictionPressureTransitionPeriod: 0s
failSwapOn: false
fileCheckFrequency: 0s
httpCheckFrequency: 0s
imageMaximumGCAge: 0s
imageMinimumGCAge: 0s
kubeReservedCgroup: system.slice
logging:
  flushFrequency: 0
  options:
    json:
      infoBufferSize: "0"
  verbosity: 0
memorySwap: {}
nodeStatusReportFrequency: 0s
nodeStatusUpdateFrequency: 0s
resolvConf: /etc/resolv.conf
rotateCertificates: true
runtimeRequestTimeout: 0s
serverTLSBootstrap: true
shutdownGracePeriod: 0s
shutdownGracePeriodCriticalPods: 0s
streamingConnectionIdleTimeout: 0s
syncFrequency: 0s
tlsCipherSuites:
- TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256
- TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384
- TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256
- TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
- TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
- TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256
tlsMinVersion: VersionTLS12
volumePluginDir: /usr/libexec/k0s/kubelet-plugins/volume/exec
volumeStatsAggPeriod: 0s

# cgroups configuration
kubeletCgroups: "/test.slice/kubelet"
systemCgroups: "/test.slice/system"
cgroupRoot: "/test.slice"

I think there is probably something prone to improvement, the way I have to do this feels way too hacky for now.

Copy link
Contributor

The issue is marked as stale since no activity has been recorded in 30 days

@github-actions github-actions bot added the Stale label May 10, 2024
@twz123 twz123 removed the Stale label May 12, 2024
Copy link
Contributor

The issue is marked as stale since no activity has been recorded in 30 days

Copy link
Contributor

The issue is marked as stale since no activity has been recorded in 30 days

@github-actions github-actions bot added the Stale label Jul 12, 2024
@twz123 twz123 removed the Stale label Jul 13, 2024
Copy link
Contributor

The issue is marked as stale since no activity has been recorded in 30 days

@github-actions github-actions bot added the Stale label Aug 12, 2024
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Aug 19, 2024
@twz123 twz123 reopened this Aug 20, 2024
@github-actions github-actions bot removed the Stale label Aug 20, 2024
Copy link
Contributor

The issue is marked as stale since no activity has been recorded in 30 days

@github-actions github-actions bot added the Stale label Sep 20, 2024
@twz123 twz123 removed the Stale label Sep 21, 2024
turdusmerula pushed a commit to turdusmerula/k0s that referenced this issue Sep 30, 2024
Copy link
Contributor

The issue is marked as stale since no activity has been recorded in 30 days

@github-actions github-actions bot added the Stale label Oct 21, 2024
@twz123 twz123 removed the Stale label Oct 23, 2024
Copy link
Contributor

The issue is marked as stale since no activity has been recorded in 30 days

@github-actions github-actions bot added the Stale label Nov 22, 2024
@twz123 twz123 removed the Stale label Nov 23, 2024
@twz123
Copy link
Member

twz123 commented Nov 23, 2024

The issues with cgroups in the docker docs and entrypoint have been addressed in #5263.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants