Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Moving from pynvml to nvidia-ml-py #245

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from
Draft

Moving from pynvml to nvidia-ml-py #245

wants to merge 3 commits into from

Conversation

skirui-source
Copy link
Contributor

@skirui-source skirui-source commented Jun 9, 2023

Fixes #151

@skirui-source skirui-source self-assigned this Jun 9, 2023
source/cloud/azure/azure-vm-multi.md Outdated Show resolved Hide resolved
source/tools/dask-cuda.md Outdated Show resolved Hide resolved
source/guides/mig.md Outdated Show resolved Hide resolved
@skirui-source
Copy link
Contributor Author

Following the instructions to set up InfiniBand on Azure (in testing whether installing nvidia-ml-py works), I am blocked by insufficient quota for the ND40rs_v2 VM size and have requested an increase...

@skirui-source
Copy link
Contributor Author

skirui-source commented Jun 13, 2023

Seeing the following error when I mamba install these packages including nvidia-ml-py:

Looking for: ['rapids=23.04', 'python=3.10', 'cudatoolkit=11.8', 'ipython', 'ucx-proc=[build=gpu]', 'ucx',  \
'ucx-py', 'dask', 'distributed', 'numpy', 'cupy', 'pytest', 'nvidia-ml-py']

nvidia/linux-64                                    120.3kB @ 484.5kB/s  0.2s
nvidia/noarch                                        3.9kB @  14.2kB/s  0.0s
rapidsai/noarch                                      6.6kB @  20.4kB/s  0.3s
rapidsai/linux-64                                  398.1kB @ 969.9kB/s  0.4s
conda-forge/noarch                                  12.6MB @   4.6MB/s  2.9s
conda-forge/linux-64                                32.1MB @   4.4MB/s  7.8s
Could not solve for environment specs
The following packages are incompatible
├─ nvidia-ml-py   is installable and it requires
│  └─ pynvml 9999999999 , which can be installed;
├─ python 3.10**  is installable and it requires
│  └─ python_abi 3.10.* *_cp310, which can be installed;
└─ rapids 23.04**  is uninstallable because there are no viable options
   ├─ rapids [23.04.00|23.04.01] would require
   │  └─ python_abi 3.8.* *_cp38, which conflicts with any installable versions previously reported;
   └─ rapids [23.04.00|23.04.01] would require
      ├─ cugraph 23.04.*  but there are no viable options
      │  ├─ cugraph [23.04.00|23.04.01] would require
      │  │  └─ python_abi 3.8.* *_cp38, which conflicts with any installable versions previously reported;
      │  └─ cugraph [23.04.00|23.04.01] would require
      │     └─ ucx-py 0.31.*  but there are no viable options
      │        ├─ ucx-py [0.31.00|0.31.01] would require
      │        │  └─ python_abi 3.8.* *_cp38, which conflicts with any installable versions previously reported;
      │        └─ ucx-py [0.31.00|0.31.01] would require
      │           └─ pynvml >=11.4.1 , which conflicts with any installable versions previously reported;
      └─ python >=3.10,<3.11.0a0 , which can be installed (as previously explained).

@jacobtomlinson
Copy link
Member

Ping @jakirkham in case you have thoughts about this error

@jakirkham
Copy link
Member

Sounds like we need to start with making this change to ucx-py first

cc @pentschev

@pentschev
Copy link
Member

How can we handle this kind of thing on a single-package? Only moving the requirement from pynvml to nvidia-ml-py will make other packages uninstallable. Can we do a "either pynvml or nvidia-ml-py" requirement for both wheels and conda packages? I think not, and it that case I think we can:

  1. Temporarily make pynvml optional; or
  2. Make the change in the package, say ucx-py in this case, but having other packages pinning to a previous version of ucx-py until all packages do the same change, then unpin everyone.

The problem with 1 is that we will need to ensure pynvml gets installed somehow, and the problem with 2 is that it will prevent new changes to packages from being properly tested (as ucx-py is not the only package that will need to be pinned). Any other ideas?

source/guides/mig.md Outdated Show resolved Hide resolved
Co-authored-by: Peter Andreas Entschev <[email protected]>
@jakirkham
Copy link
Member

Interesting, was thinking we would make the change across all RAPIDS libraries as part of a release

IOW there wouldn't be a need for a mixed state from a final release perspective. Rolling out the change might be similar to how CUDA 12 is being rolled out (though that probably needs more discussion)

In terms of the 2 libraries, the conda-forge packages are designed to conflict with one another. So a user can only install one library or the other. Not aware of an equivalent solution for wheels

@pentschev
Copy link
Member

Interesting, was thinking we would make the change across all RAPIDS libraries as part of a release

IOW there wouldn't be a need for a mixed state from a final release perspective. Rolling out the change might be similar to how CUDA 12 is being rolled out (though that probably needs more discussion)

Yes, I agree. I'm more thinking of how to address this during development, so that we can still build packages properly. But given your comment on how to roll out the change, your suggestion is then that we build using CI artifacts for the time being? And if so, how do we make non-CI builds to still work in the meantime?

In terms of the 2 libraries, the conda-forge packages are designed to conflict with one another. So a user can only install one library or the other. Not aware of an equivalent solution for wheels

Yeah, that makes sense. But with my suggestion of removing the requirement for pynvml (and thus making it optional from the installer's perspective) will still require that it gets installed somehow, and this is something I'm not sure how we would deal with in the meantime.

@skirui-source skirui-source marked this pull request as draft July 10, 2023 13:48
@skirui-source skirui-source removed their assignment Jan 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Moving from pynvml to nvidia-ml-py
5 participants