Skip to content

Commit

Permalink
doc: Add documentation on how to add a GPU with CDI mode
Browse files Browse the repository at this point in the history
Signed-off-by: Gabriel Mougard <[email protected]>
  • Loading branch information
gabrielmougard committed Jun 21, 2024
1 parent 1ae137e commit d4c2b57
Show file tree
Hide file tree
Showing 2 changed files with 101 additions and 0 deletions.
13 changes: 13 additions & 0 deletions doc/.custom_wordlist.txt
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ ABI
ACL
ACLs
AGPL
AGX
AIO
allocator
AMD
Expand All @@ -23,6 +24,7 @@ BPF
Btrfs
bugfix
bugfixes
CDI
CentOS
Ceph
CephFS
Expand All @@ -43,6 +45,7 @@ cron
CSV
CUDA
dataset
dGPU
DCO
dereferenced
DHCP
Expand Down Expand Up @@ -92,6 +95,8 @@ IdP
idmap
idmapped
idmaps
iGPU
IGX
incrementing
InfiniBand
init
Expand All @@ -102,6 +107,7 @@ IPAM
IPs
IPv
IPVLAN
Jetson
JIT
jq
kB
Expand Down Expand Up @@ -147,6 +153,8 @@ NICs
NUMA
NVMe
NVRAM
NVIDIA
OCI
OData
OIDC
OpenFGA
Expand All @@ -160,6 +168,7 @@ OverlayFS
OVMF
OVN
OVS
passthrough
Pbit
PCI
PCIe
Expand Down Expand Up @@ -199,6 +208,7 @@ runtime
SATA
scalable
scriptlet
SDK
SDN
SDS
SDT
Expand All @@ -214,6 +224,7 @@ SKBPRIO
SLAAC
SMTP
Snapcraft
SoC
Solaris
SPAs
SPL
Expand Down Expand Up @@ -247,6 +258,8 @@ sysfs
syslog
Tbit
TCP
TensorRT
Tegra
TiB
Tibit
TinyPNG
Expand Down
88 changes: 88 additions & 0 deletions doc/reference/devices_gpu.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,8 +51,96 @@ Add a specific GPU from the host system as a `physical` GPU device to an instanc

lxc config device add <instance_name> <device_name> gpu gputype=physical pci=<pci_address>

Add a specific GPU from the host system as a `physical` GPU device to an instance using the [Container Device Interface](https://github.com/cncf-tags/container-device-interface) (CDI) notation.

lxc config device add <instance_name> <device_name> gpu gputype=physical id=<fq_CDI_name>

See {ref}`instances-configure-devices` for more information.

#### Passing an NVIDIA iGPU to a container

Adding a device with the CDI notation is particularly useful if you have NVIDIA runtime libraries and configuration installed on your host and that you want to pass these files to your container. Let's take the example of the iGPU passthrough:

Your host is an NVIDIA single board computer that has a Tegra SoC with an iGPU. You also have an SDK installed on the host, giving you access to plenty of useful libraries to handle AI workloads. You would want to create a LXD container and run an inference job inside the container using the iGPU as a backend. You would also like the inference job to be ran inside Docker container (or whatever OCI-compliant runtime). You could do something like this:

Initialize a LXD container:

lxc init ubuntu:24.04 t1 --config security.nested=true --config security.privileged=true

Add an iGPU device to your container:

lxc config device add t1 igpu0 gpu gputype=physical id=nvidia.com/gpu=igpu0

Apply a `cloud-init` script to your instance to install the the Docker runtime, the [NVIDIA Container Toolkit](https://github.com/NVIDIA/nvidia-container-toolkit) and a script to run a test [TensorRT](https://github.com/NVIDIA/TensorRT) workload:

```yaml
#cloud-config
package_update: true
packages:
- docker.io
write_files:
- path: /etc/docker/daemon.json
permissions: '0644'
owner: root:root
content: |
{
"max-concurrent-downloads": 12,
"max-concurrent-uploads": 12,
"runtimes": {
"nvidia": {
"args": [],
"path": "nvidia-container-runtime"
}
}
}
- path: /root/run_tensorrt.sh
permissions: '0755'
owner: root:root
content: |
#!/bin/bash
echo "OS release,Kernel version"
(. /etc/os-release; echo "${PRETTY_NAME}"; uname -r) | paste -s -d,
echo
nvidia-smi -q
echo
exec bash -o pipefail -c "
cd /workspace/tensorrt/samples
make -j4
cd /workspace/tensorrt/bin
./sample_onnx_mnist
retstatus=\${PIPESTATUS[0]}
echo \"Test exited with status code: \${retstatus}\" >&2
exit \${retstatus}
"
runcmd:
- systemctl start docker
- systemctl enable docker
- usermod -aG docker root
- curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
- curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
- apt-get update
- DEBIAN_FRONTEND=noninteractive apt-get install -y nvidia-container-toolkit
- nvidia-ctk runtime configure
- systemctl restart docker
```
Apply this `cloud-init` setup to your instance:

lxc config set t1 cloud-init.user-data - < cloud-init.yml

Now you can start the instance:

lxc start t1

Wait for the `cloud-init` process to finish:

lxc exec t1 -- cloud-init status --wait

Finally, you can run your inference job inside the LXD container. Note: do not forget to modify the `mode` of the NVIDIA Container Runtime inside the LXD container to the value `csv` and not `auto` if you want to let Docker know that the NVIDIA runtime must be enabled with CSV mode. This configuration file can be found at `/etc/nvidia-container-runtime/config.toml`:

lxc shell t1
root@t1 # docker run --gpus all --runtime nvidia --rm -v $(pwd):/sh_input nvcr.io/nvidia/tensorrt:24.02-py3-igpu bash /sh_input/run_tensorrt.sh

(gpu-mdev)=
## `gputype`: `mdev`

Expand Down

0 comments on commit d4c2b57

Please sign in to comment.