Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retrieve CUDA available memory via torch.cuda.mem_get_info() #4847

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

XuehaiPan
Copy link
Contributor

@XuehaiPan XuehaiPan commented Dec 20, 2023

This PR refactors the available_memory() method for the CUDA accelerator to use free, total = torch.cuda.mem_get_info(). It also removes the hard dependency pynvml.

Related PR:

The torch.cuda.mem_get_info() function was added two years ago (May 26th, 2021). We have already relied on torch.cuda.is_bf16_supported() without a torch version check in the next method below. The torch.cuda.is_bf16_supported() function was added on August 26th, 2021. So we can assume the torch.cuda.mem_get_info() function is always available for the torch version we support.

Rationale

  1. The official NVML Python binding package is nvidia-ml-py rather than pynvml on PyPI. See the documentation on https://pypi.org/project/pynvml:

    This is a wrapper around the NVML library. For information about the NVML library, see the NVML developer page http://developer.nvidia.com/nvidia-management-library-nvml

    As of version 11.0.0, the NVML-wrappers used in pynvml are identical to those published through nvidia-ml-py.

  2. Having pynvml will add an extra dependency. It will also break the users' Python environment if they have nvidia-ml-py installed. Because both pynvml and nvidia-ml-py provide the pynvml module. We can rely on torch.cuda.mem_get_info() where no extra dependency will be added.

  3. Handling the CUDA_VISIBLE_DEVICES environment variable is very complex. The variable can be a comma-separated list of integers or UUID strings. Currently, we only support integers. The torch.cuda.mem_get_info() directly calls the CUDA API which does not need index conversion between CUDA and NVML.

def _get_nvml_gpu_id(self, torch_gpu_id):
"""
credit: https://discuss.pytorch.org/t/making-pynvml-match-torch-device-ids-cuda-visible-devices/103020
Remap torch device id to nvml device id, respecting CUDA_VISIBLE_DEVICES.
If the latter isn't set return the same id
"""
# if CUDA_VISIBLE_DEVICES is used automagically remap the id since pynvml ignores this env var
if "CUDA_VISIBLE_DEVICES" in os.environ:
ids = list(map(int, os.environ.get("CUDA_VISIBLE_DEVICES", "").split(",")))
return ids[torch_gpu_id] # remap
else:
return torch_gpu_id

$ nvidia-smi -L
GPU 0: NVIDIA GeForce RTX 3090 (UUID: GPU-3cd9eb06-03f4-3b39-2f7b-48ee826b0a26)
GPU 1: NVIDIA GeForce RTX 3090 (UUID: GPU-611f484b-7a5a-f1ae-5aac-64d2ddad1ab6)
GPU 2: NVIDIA GeForce RTX 3090 (UUID: GPU-ba171e16-8df7-e1c4-5468-2ee35e18d1f0)
GPU 3: NVIDIA GeForce RTX 3090 (UUID: GPU-66bd9aec-436e-24eb-91e8-d31d6370d8f0)
GPU 4: NVIDIA GeForce RTX 3090 (UUID: GPU-9cc6b251-34a2-db9d-4ca0-7532f951aad2)
GPU 5: NVIDIA GeForce RTX 3090 (UUID: GPU-a6c609c1-078d-e47e-b418-8008e61a8cf6)
GPU 6: NVIDIA GeForce RTX 3090 (UUID: GPU-be37798a-62fb-ebee-90d2-01b018d81c6d)
GPU 7: NVIDIA GeForce RTX 3090 (UUID: GPU-8b2e78db-cff8-bb89-d9fd-64f1633df658)

$ export CUDA_VISIBLE_DEVICES="GPU-ba171e16,GPU-611f484b,GPU-3cd9eb06"

$ ipython
In [1]: import torch

In [2]: torch.cuda.memory_allocated(0)
Out[2]: 0

In [3]: torch.cuda.get_device_properties(0).total_memory
Out[3]: 25447170048

In [4]: torch.cuda.mem_get_info(0)
Out[4]: (510328832, 25447170048)

In [5]: from nvitop import CudaDevice

In [6]: cuda0 = CudaDevice(0)
   ...: cuda0
Out[6]: CudaDevice(cuda_index=0, nvml_index=2, name="NVIDIA GeForce RTX 3090", total_memory=24.00GiB)

In [7]: cuda0.memory_free()
Out[7]: 510328832

In [8]: cuda0.memory_used()
Out[8]: 24936841216

In [9]: cuda0.memory_total()
Out[9]: 25769803776

@mrwyattii
Copy link
Contributor

Hi @XuehaiPan - thank you for the contribution. If I recall correctly, we had to use pynvml because we were getting inaccurate memory information from torch in some scenarios. @jeffra may be able to comment more on this.

Either way, I will try out this branch and see if that is still the case. In particular, this code is necessary for FastGen and DeepSpeed-MII.

@loadams
Copy link
Contributor

loadams commented Jan 2, 2024

Hi @XuehaiPan - thank you for the contribution. If I recall correctly, we had to use pynvml because we were getting inaccurate memory information from torch in some scenarios. @jeffra may be able to comment more on this.

Either way, I will try out this branch and see if that is still the case. In particular, this code is necessary for FastGen and DeepSpeed-MII.

If we aren't able to switch over, would it at least make sense to move to the nvidia-ml-py package as it is more regularly updated and at least matches the cuda version?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants