Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: drop nvidia-device-plugin feature #60

Merged

Conversation

arnaldo2792
Copy link
Contributor

@arnaldo2792 arnaldo2792 commented Sep 6, 2024

Issue #, if available:
N / A

Description of changes:

Use a setings extension instead of a cargo feature for the kubelet device plugins API. See #57 (comment) for details.

Testing:

As part of bottlerocket-os/bottlerocket-core-kit#132 and bottlerocket-os/bottlerocket#4182

  1. Instance joined the cluster:
NAME                                           STATUS   ROLES    AGE    VERSION
ip-XXXX.us-west-2.compute.internal   Ready    <none>   22s    v1.30.1-eks-e564799
  1. Files were generated using the new values
bash-5.1# apiclient get settings.kubelet-device-plugin
{
  "settings": {
    "kubelet-device-plugin": {
      "nvidia": {
        "device-id-strategy": "index",
        "device-list-strategy": "volume-mounts",
        "pass-device-specs": true
      }
    }
  }
}

bash-5.1# cat /etc/systemd/system/nvidia-k8s-device-plugin.service.d/exec-start.conf
[Service]
ExecStart=
ExecStart=/usr/bin/nvidia-device-plugin --config-file=/etc/nvidia-k8s-device-plugin/settings.yaml
bash-5.1# cat /etc/nvidia-container-runtime/config.toml
### generated from the template file ###
accept-nvidia-visible-devices-as-volume-mounts = true
accept-nvidia-visible-devices-envvar-when-unprivileged = false

[nvidia-container-cli]
root = "/"
path = "/usr/bin/nvidia-container-cli"
environment = []
ldconfig = "@/sbin/ldconfig"
bash-5.1#

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Copy link
Contributor

@ginglis13 ginglis13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with some nits. Quick question - I noticed in reading your testing in the description, why are we setting two ExecStart=, the first of which is empty?

https://github.com/bottlerocket-os/bottlerocket-core-kit/blob/4b528d3ac4c9574befc053dc3f64c095234819c4/packages/nvidia-k8s-device-plugin/nvidia-k8s-device-plugin-exec-start-conf#L6-L7

@@ -0,0 +1,75 @@
//! Settings related to Amazon ECS
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this doc comment is not accurate

nvidia: NvidiaDevicePluginSettings,
}

//
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: should this just be a blank line?

@arnaldo2792
Copy link
Contributor Author

The first line is empty to reset the ExecStart commands. Otherwise, systemd will attempt to use the two lines and fail to start nvidia-k8s-device-plugin.service because the service type is simple which only accepts one single ExecStart.

use std::convert::Infallible;

#[model(impl_default = true)]
pub struct KubeletDevicePluginV1 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to expose this as settings.kubelet-device-plugins for consistency with our other plural settings clusters. So the type name should also be plural.

Suggested change
pub struct KubeletDevicePluginV1 {
pub struct KubeletDevicePluginsV1 {

@arnaldo2792
Copy link
Contributor Author

Forced push includes:

  • Fix nits
  • Rename KubeletDevicePluginV1 to KubeletDevicePluginsV1

@@ -0,0 +1,18 @@
use bottlerocket_settings_sdk::{BottlerocketSetting, NullMigratorExtensionBuilder};
use settings_extension_kubelet_device_plugin::KubeletDevicePluginsV1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
use settings_extension_kubelet_device_plugin::KubeletDevicePluginsV1;
use settings_extension_kubelet_device_plugins::KubeletDevicePluginsV1;

Use a setings extension instead of a cargo feature for the kubelet
device plugins API

Signed-off-by: Arnaldo Garcia Rincon <[email protected]>
@arnaldo2792
Copy link
Contributor Author

(forced push renames kubelet-device-plugin to kubelet-device-plugins and all the affected places)

@arnaldo2792
Copy link
Contributor Author

I confirmed in a local build that kubelet-device-plugins is working as expected:

bash-5.1# apiclient get settings.kubelet-device-plugins
{
  "settings": {
    "kubelet-device-plugins": {
      "nvidia": {
        "device-id-strategy": "index",
        "device-list-strategy": "envvar",
        "pass-device-specs": true
      }
    }
  }
}

@arnaldo2792 arnaldo2792 merged commit 4ad8f9e into bottlerocket-os:develop Sep 10, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants