Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check pynvml works after move to nvidia-ml-py #490

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ncclementi
Copy link
Contributor

Closes #151

These are the places where we import, link or mention pynvml https://github.com/search?q=repo%3Arapidsai%2Fdeployment+pynvml&type=code

As part of this PR I manually went over these with a RAPIDS 25.02 nightly installation and check they worked as expected.

Questions remaining:

  • Here we are installing pynvml but we also install rapids which comes with pynvml do we want to remove then specific package or replace it for nvidia-ml-py

    mamba create -n ucxpy {{ rapids_conda_channels }} {{ rapids_conda_packages }} ipython ucx-proc=*=gpu ucx ucx-py dask distributed numpy cupy pytest pynvml -y

  • Do we want to try this in azure itself, or the confirmation of running a similar installation locally and running this is sufficient?

@ncclementi ncclementi requested a review from a team as a code owner January 9, 2025 22:27
@jacobtomlinson
Copy link
Member

@jakirkham might be able to answer the question about the metapackage. Do we need to explicitly install this at all?

I don't think we need to try things on Azure directly, although you can if you want to. The main thing is checking the code snippet works on a machine with nvidia-ml-py installed.

@ncclementi
Copy link
Contributor Author

I tried to check this on a VM in particular an Azure VM, and I noticed that when installing rapids-notebook nightly via docker.

docker run --gpus all --pull always --rm -it \
    --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 \
    -p 8888:8888 -p 8787:8787 -p 8786:8786 \
    rapidsai/notebooks:25.02a-cuda12.5-py3.12

It comes with an older pynvml, meaining there is still some libraries that might be pinned with <12.0 still.

In the VM via docker, we get

>>> import pynvml
>>> pynvml.__version__
'11.5.3'

That being said, I upgraded it once in the VM and pynvml worked as expected.

@jakirkham
Copy link
Member

It looks like the Docker nightly image builds have been failing. Appears there is some issue downloading datasets for the cuVS benchmark step. Have filed an upstream issue: rapidsai/docker#724

So think this just comes down the Docker images being a bit stale as it has been a bit since we have had a successful nightly build there

Will take a note to mention this to the OPS and cuVS teams on Monday. Maybe someone can provide some guidance on how we can address this to get the Docker builds up and running again

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Moving from pynvml to nvidia-ml-py
3 participants