Instruct Lab NVIDIA Container

Prepare a container for running Instruct Lab with an NVIDIA GPU. This does not require the CUDA Toolkit installed on the host. This is useful because the CUDA toolkit is not super widely supported.

The build is set up in a two stage process using a builder with the compile dependencies and the final image with just runtime dependencies.

The builer image is around 17GB, and the final image is around 9GB.

Host Installation

You will need to install nvidia-toolkit and the NVIDIA proprietary driver on your host. For Fedora 40, a script is available in host-prep/fc40.sh. It may work on other systems but it is not widely tested.

Host Verification

You will need to verify that the GPU is detected on both the host and inside a container. A test script is available in scripts/host-verify.sh

Container Build

Prepare your environment with arguments for the container build:
- (Required) INSTRUCT_LAB_TAXONOMY_PATH: The path to the git repo for taxonomy.
- (Optional) CONTAINER_BACKEND: The container backend to use. The default is podman.
- (Optional) CONTAINER_NAME: The name used for the container tag and deployed container name. The default is instruct-lab-nvidia-container.
- (Optional) NVIDIA_DEVICE: The nvidia devices passed to the container. The default is nvidia.com/gpu-all. Specific identifiers can be found in the output of nvidia-ctk cdi list.
- (Optional) CONTAINER_BASE_IMAGE: The base image to use for the container build. The default is docker.io/rockylinux/rockylinux:9-minimal.
- (Optional) CONTAINER_DIR: The directory used inside the container. The default is /work.
- (Optional) CUDA_MAJOR_VERSION: The major version of CUDA. Default is 12.
- (Optional) CUDA_MINOR_VERSION: The minor version of CUDA. Default is 4.
- (Optional) HUGGINGFACE_CACHE_DIR: The directory that is used to cache model downloads. The default is ${HOME}/.cache/huggingface.
Run the container build:

make container

Container Installation

Connect to the detached container.

make exec-container

Download an existing model to get started:

ilab download

Start by serving the model:

ilab serve

Container Verification

Back on the host, open another terminal into the existing container:

make exec-container

Begin a chat:

ilab chat

Back on the host in a third terminal, check that the NVIDIA GPU is in use while chat serves you a response.

nvidia-smi

libEGL_nvidia.so version error

If you get an error like this trying to launch the container:

Error: crun: cannot stat `/lib64/libEGL_nvidia.so.550.54.14`: No such file or directory: OCI runtime attempted to invoke a command that was not found

Then you probably need to re-generate the CDI specification:

sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml

This is caused by updating the NVIDIA driver after installing / last generating the spec.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
.github		.github
requirements		requirements
scripts		scripts
.containerignore		.containerignore
.gitignore		.gitignore
Containerfile		Containerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
config.yaml		config.yaml
entrypoint.sh		entrypoint.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Instruct Lab NVIDIA Container

Host Installation

Host Verification

Container Build

Container Installation

Container Verification

libEGL_nvidia.so version error

About

Languages

License

fontivan/instruct-lab-nvidia-container

Folders and files

Latest commit

History

Repository files navigation

Instruct Lab NVIDIA Container

Host Installation

Host Verification

Container Build

Container Installation

Container Verification

libEGL_nvidia.so version error

About

Resources

License

Stars

Watchers

Forks

Languages