/demo/clusters/kind/create-cluster.sh
fails with umount: /proc/driver/nvidia: not mounted
#811
Closed
5 of 9 tasks
Labels
lifecycle/stale
Denotes an issue or PR has remained open with no activity and has become stale.
1. Quick Debug Information
$ uname -a Linux mbana-1 6.5.0-35-generic #35~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Tue May 7 09:00:52 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
$ nvidia-container-runtime --version NVIDIA Container Runtime version 1.15.0 commit: ddeeca392c7bd8b33d0a66400b77af7a97e16cef spec: 1.2.0 runc version 1.1.12 commit: v1.1.12-0-g51d5e94 spec: 1.0.2-dev go: go1.21.11 libseccomp: 2.5.3 $ docker version Client: Docker Engine - Community Version: 26.1.4 API version: 1.45 Go version: go1.21.11 Git commit: 5650f9b Built: Wed Jun 5 11:28:57 2024 OS/Arch: linux/amd64 Context: default Server: Docker Engine - Community Engine: Version: 26.1.4 API version: 1.45 (minimum version 1.24) Go version: go1.21.11 Git commit: de5c9cf Built: Wed Jun 5 11:28:57 2024 OS/Arch: linux/amd64 Experimental: false containerd: Version: 1.6.33 GitCommit: d2d58213f83a351ca8f528a95fbd145f5654e957 nvidia: Version: 1.1.12 GitCommit: v1.1.12-0-g51d5e94 docker-init: Version: 0.19.0 GitCommit: de40ad0 $ nvidia-container-cli -V cli-version: 1.15.0 lib-version: 1.15.0 build date: 2024-04-15T13:36+00:00 build revision: 6c8f1df7fd32cea3280cf2a2c6e931c9b3132465 build compiler: x86_64-linux-gnu-gcc-7 7.5.0 build platform: x86_64 build flags: -D_GNU_SOURCE -D_FORTIFY_SOURCE=2 -DNDEBUG -std=gnu11 -O2 -g -fdata-sections -ffunction-sections -fplan9-extensions -fstack-protector -fno-strict-aliasing -fvisibility=hidden -Wall -Wextra -Wcast-align -Wpointer-arith -Wmissing-prototypes -Wnonnull -Wwrite-strings -Wlogical-op -Wformat=2 -Wmissing-format-attribute -Winit-self -Wshadow -Wstrict-prototypes -Wunreachable-code -Wconversion -Wsign-conversion -Wno-unknown-warning-option -Wno-format-extra-args -Wno-gnu-alignof-expression -Wl,-zrelro -Wl,-znow -Wl,-zdefs -Wl,--gc-sections
2. Issue or feature description
The
./demo/clusters/kind/create-cluster.sh
seems to fail:3. Information to attach (optional if deemed irrelevant)
Common error checking:
nvidia-smi -a
on your host/etc/docker/daemon.json
)sudo journalctl -r -u kubelet
)Additional information that might help better understand your environment and reproduce the bug:
docker version
uname -a
dmesg
dpkg -l '*nvidia*'
orrpm -qa '*nvidia*'
I am not sure what the script is trying to do but when I exec into the worker, it is reporting the correct GPU information:
The text was updated successfully, but these errors were encountered: