nvidia-k8s-device-plugin: wait for kubelet.sock before starting #228
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Issue number:
Closes bottlerocket-os/bottlerocket#4250
Description of changes:
It is possible for
nvidia-k8s-device-plugin
andkubelet
to race, causing graphics nodes to fail to expose gpus via kubelet.Specifically,
nvidia-k8s-device-plugin
starts afterkubelet
; however, it depends on thekubelet
device-plugin management socket to be available in order to register itself.The
kubelet
service does not synchronize its start of the device-plugin management socket with its systemd "notify" signal, which means thatkubelet
may start before the socket is ready.If the socket is created after
nvidia-k8s-device-pluging
starts watching the socket for inotify events, it may trigger the device-plugin's restart logic (the device plugin assumes that kubelet has restarted in this case).Unfortunately, device plugin restarts seem to be somewhat flaky due to issues discussed in bottlerocket-os/bottlerocket#4250.
This change causes the
nvidia-k8s-device-plugin
to requirekubelet.sock
to exist as a socket. The unit will fail to start, and subsequently retry every 2 seconds until the socket is available. We perform an initial sleep, because it turns out thatkubelet.sock
usually does not exist by the time that systemd tries to startnvidia-k8s-device-plugin
.Testing done:
I created this patch which causes the inotify race to always occur, which massively increased the incidence of the failure case.
After hundreds of instance launches, I have not witnessed a single instance with missing GPU resources (whereas the failure incidence is ~40% on Bottlerocket 1.25.0 with my faulty patch added).
Terms of contribution:
By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.