Moving from pynvml to nvidia-ml-py #245

skirui-source · 2023-06-09T22:35:32Z

Fixes #151

source/cloud/azure/azure-vm-multi.md

source/tools/dask-cuda.md

source/guides/mig.md

skirui-source · 2023-06-13T04:50:50Z

Following the instructions to set up InfiniBand on Azure (in testing whether installing nvidia-ml-py works), I am blocked by insufficient quota for the ND40rs_v2 VM size and have requested an increase...

skirui-source · 2023-06-13T21:48:33Z

Seeing the following error when I mamba install these packages including nvidia-ml-py:

Looking for: ['rapids=23.04', 'python=3.10', 'cudatoolkit=11.8', 'ipython', 'ucx-proc=[build=gpu]', 'ucx',  \
'ucx-py', 'dask', 'distributed', 'numpy', 'cupy', 'pytest', 'nvidia-ml-py']

nvidia/linux-64                                    120.3kB @ 484.5kB/s  0.2s
nvidia/noarch                                        3.9kB @  14.2kB/s  0.0s
rapidsai/noarch                                      6.6kB @  20.4kB/s  0.3s
rapidsai/linux-64                                  398.1kB @ 969.9kB/s  0.4s
conda-forge/noarch                                  12.6MB @   4.6MB/s  2.9s
conda-forge/linux-64                                32.1MB @   4.4MB/s  7.8s
Could not solve for environment specs
The following packages are incompatible
├─ nvidia-ml-py   is installable and it requires
│  └─ pynvml 9999999999 , which can be installed;
├─ python 3.10**  is installable and it requires
│  └─ python_abi 3.10.* *_cp310, which can be installed;
└─ rapids 23.04**  is uninstallable because there are no viable options
   ├─ rapids [23.04.00|23.04.01] would require
   │  └─ python_abi 3.8.* *_cp38, which conflicts with any installable versions previously reported;
   └─ rapids [23.04.00|23.04.01] would require
      ├─ cugraph 23.04.*  but there are no viable options
      │  ├─ cugraph [23.04.00|23.04.01] would require
      │  │  └─ python_abi 3.8.* *_cp38, which conflicts with any installable versions previously reported;
      │  └─ cugraph [23.04.00|23.04.01] would require
      │     └─ ucx-py 0.31.*  but there are no viable options
      │        ├─ ucx-py [0.31.00|0.31.01] would require
      │        │  └─ python_abi 3.8.* *_cp38, which conflicts with any installable versions previously reported;
      │        └─ ucx-py [0.31.00|0.31.01] would require
      │           └─ pynvml >=11.4.1 , which conflicts with any installable versions previously reported;
      └─ python >=3.10,<3.11.0a0 , which can be installed (as previously explained).

jacobtomlinson · 2023-06-20T15:43:41Z

Ping @jakirkham in case you have thoughts about this error

jakirkham · 2023-06-20T20:27:20Z

Sounds like we need to start with making this change to ucx-py first

cc @pentschev

pentschev · 2023-06-21T07:59:13Z

How can we handle this kind of thing on a single-package? Only moving the requirement from pynvml to nvidia-ml-py will make other packages uninstallable. Can we do a "either pynvml or nvidia-ml-py" requirement for both wheels and conda packages? I think not, and it that case I think we can:

Temporarily make pynvml optional; or
Make the change in the package, say ucx-py in this case, but having other packages pinning to a previous version of ucx-py until all packages do the same change, then unpin everyone.

The problem with 1 is that we will need to ensure pynvml gets installed somehow, and the problem with 2 is that it will prevent new changes to packages from being properly tested (as ucx-py is not the only package that will need to be pinned). Any other ideas?

source/guides/mig.md

Co-authored-by: Peter Andreas Entschev <[email protected]>

jakirkham · 2023-06-21T19:08:49Z

Interesting, was thinking we would make the change across all RAPIDS libraries as part of a release

IOW there wouldn't be a need for a mixed state from a final release perspective. Rolling out the change might be similar to how CUDA 12 is being rolled out (though that probably needs more discussion)

In terms of the 2 libraries, the conda-forge packages are designed to conflict with one another. So a user can only install one library or the other. Not aware of an equivalent solution for wheels

pentschev · 2023-06-21T20:52:53Z

Interesting, was thinking we would make the change across all RAPIDS libraries as part of a release

IOW there wouldn't be a need for a mixed state from a final release perspective. Rolling out the change might be similar to how CUDA 12 is being rolled out (though that probably needs more discussion)

Yes, I agree. I'm more thinking of how to address this during development, so that we can still build packages properly. But given your comment on how to roll out the change, your suggestion is then that we build using CI artifacts for the time being? And if so, how do we make non-CI builds to still work in the meantime?

In terms of the 2 libraries, the conda-forge packages are designed to conflict with one another. So a user can only install one library or the other. Not aware of an equivalent solution for wheels

Yeah, that makes sense. But with my suggestion of removing the requirement for pynvml (and thus making it optional from the installer's perspective) will still require that it gets installed somehow, and this is something I'm not sure how we would deal with in the meantime.

replace pynvml with nvidia-ml-py in codebase

d96155d

skirui-source requested a review from jacobtomlinson as a code owner June 9, 2023 22:35

skirui-source self-assigned this Jun 9, 2023

jacobtomlinson reviewed Jun 12, 2023

View reviewed changes

source/cloud/azure/azure-vm-multi.md Outdated Show resolved Hide resolved

source/tools/dask-cuda.md Outdated Show resolved Hide resolved

source/guides/mig.md Outdated Show resolved Hide resolved

revert changes, link to PyPI for nvidia-ml-py

66b1127

skirui-source requested review from jacobtomlinson and rjzamora June 12, 2023 21:29

pentschev reviewed Jun 21, 2023

View reviewed changes

source/guides/mig.md Outdated Show resolved Hide resolved

Update source/guides/mig.md

6ec0c70

Co-authored-by: Peter Andreas Entschev <[email protected]>

skirui-source marked this pull request as draft July 10, 2023 13:48

skirui-source removed their assignment Jan 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Moving from pynvml to nvidia-ml-py #245

Moving from pynvml to nvidia-ml-py #245

skirui-source commented Jun 9, 2023 •

edited

Loading

skirui-source commented Jun 13, 2023

skirui-source commented Jun 13, 2023 •

edited

Loading

jacobtomlinson commented Jun 20, 2023

jakirkham commented Jun 20, 2023

pentschev commented Jun 21, 2023

jakirkham commented Jun 21, 2023

pentschev commented Jun 21, 2023

Moving from pynvml to nvidia-ml-py #245

Are you sure you want to change the base?

Moving from pynvml to nvidia-ml-py #245

Conversation

skirui-source commented Jun 9, 2023 • edited Loading

skirui-source commented Jun 13, 2023

skirui-source commented Jun 13, 2023 • edited Loading

jacobtomlinson commented Jun 20, 2023

jakirkham commented Jun 20, 2023

pentschev commented Jun 21, 2023

jakirkham commented Jun 21, 2023

pentschev commented Jun 21, 2023

skirui-source commented Jun 9, 2023 •

edited

Loading

skirui-source commented Jun 13, 2023 •

edited

Loading