Check pynvml works after move to nvidia-ml-py #490

ncclementi · 2025-01-09T22:27:25Z

Closes #151

These are the places where we import, link or mention pynvml https://github.com/search?q=repo%3Arapidsai%2Fdeployment+pynvml&type=code

As part of this PR I manually went over these with a RAPIDS 25.02 nightly installation and check they worked as expected.

Questions remaining:

Here we are installing pynvml but we also install rapids which comes with pynvml do we want to remove then specific package or replace it for nvidia-ml-py

deployment/source/guides/azure/infiniband.md

Line 261 in 370c722

    
           mamba create -n ucxpy {{ rapids_conda_channels }} {{ rapids_conda_packages }} ipython ucx-proc=*=gpu ucx ucx-py dask distributed numpy cupy pytest pynvml -y

Do we want to try this in azure itself, or the confirmation of running a similar installation locally and running this is sufficient?

deployment/source/cloud/azure/azure-vm-multi.md

Line 57 in 370c722

import pynvml

jacobtomlinson · 2025-01-10T10:29:31Z

@jakirkham might be able to answer the question about the metapackage. Do we need to explicitly install this at all?

I don't think we need to try things on Azure directly, although you can if you want to. The main thing is checking the code snippet works on a machine with nvidia-ml-py installed.

ncclementi · 2025-01-10T23:33:55Z

I tried to check this on a VM in particular an Azure VM, and I noticed that when installing rapids-notebook nightly via docker.

docker run --gpus all --pull always --rm -it \
    --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 \
    -p 8888:8888 -p 8787:8787 -p 8786:8786 \
    rapidsai/notebooks:25.02a-cuda12.5-py3.12

It comes with an older pynvml, meaining there is still some libraries that might be pinned with <12.0 still.

In the VM via docker, we get

>>> import pynvml
>>> pynvml.__version__
'11.5.3'

That being said, I upgraded it once in the VM and pynvml worked as expected.

jakirkham · 2025-01-11T01:09:56Z

It looks like the Docker nightly image builds have been failing. Appears there is some issue downloading datasets for the cuVS benchmark step. Have filed an upstream issue: rapidsai/docker#724

So think this just comes down the Docker images being a bit stale as it has been a bit since we have had a successful nightly build there

Will take a note to mention this to the OPS and cuVS teams on Monday. Maybe someone can provide some guidance on how we can address this to get the Docker builds up and running again

update pynvml link to point to nvidia-ml-py

998eba8

ncclementi requested a review from a team as a code owner January 9, 2025 22:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Check pynvml works after move to nvidia-ml-py #490

Check pynvml works after move to nvidia-ml-py #490

ncclementi commented Jan 9, 2025

jacobtomlinson commented Jan 10, 2025

ncclementi commented Jan 10, 2025

jakirkham commented Jan 11, 2025

Check pynvml works after move to nvidia-ml-py #490

Are you sure you want to change the base?

Check pynvml works after move to nvidia-ml-py #490

Conversation

ncclementi commented Jan 9, 2025

jacobtomlinson commented Jan 10, 2025

ncclementi commented Jan 10, 2025

jakirkham commented Jan 11, 2025