pyNVML won't work on a Jetson, is there a workaround #400

JasonAtNvidia · 2020-09-16T15:58:45Z

There is no NVML library on aarch64 NVIDIA Jetson That will break many libraries relying on this library, such as cuxfilter. The geospatial and cuxfilter libraries are among the most requested for Jetson and I'd love to make it work. Is there a way to use Numba functions to replace pyNVML in this library?

quasiben · 2020-09-16T16:06:34Z

So we use pynvml in two places:

getting the number of GPUs in machine -- this is easy to do
getting the CPU affinity for GPUs -- this will be challenging to replace. Do Jetson devices have more than one GPU ?

JasonAtNvidia · 2020-09-16T17:02:17Z

The probability of a Jetson with a discrete GPU is ultra low and we can say that they don't exist outside of NVIDIA DRIVE units. We could easily wrap the affinity functionality in a statement such as if "tegra" in platform.uname().release and that would indicate a Jetson device.

jakirkham · 2020-09-16T17:53:01Z

It might be possible to detect affinity through hwloc.

quasiben · 2020-09-16T18:15:50Z

Alternatively, if there is only one GPU on a jetson, does device affinity do anything ?

JasonAtNvidia · 2020-09-16T18:21:29Z

Theoretically there is no device affinity on a Jetson, GPU and CPU share the same chunk of RAM and don't have to communicate via PCI bus.

pentschev · 2020-09-16T20:31:40Z

Do any of the Jetson board have multiple GPUs @JasonAtNvidia ? Note that dask-cuda is targeting a one-process-per-GPU model for parallelism, and if none of the boards have multiple GPUs you may not have a lot of use for dask-cuda anyway.

If there are multiple GPU Jetsons, is there a reliable way to query whether the system is running on a Jetson? We can certainly add some conditions and work around pyNVML, we do something similar for the DGXs in

dask-cuda/dask_cuda/tests/test_dgx.py

Lines 30 to 40 in 8d42f27

    
           def _get_dgx_name(): 
        
               product_name_file = "/sys/class/dmi/id/product_name" 
        
               dgx_release_file = "/etc/dgx-release" 
        
               # We verify `product_name_file` to check it's a DGX, and check 
        
               # if `dgx_release_file` exists to confirm it's not a container. 
        
               if not os.path.isfile(product_name_file) or not os.path.isfile(dgx_release_file): 
        
                   return None 
        
               for line in open(product_name_file): 
        
                   return line

, although those are only for tests today.

JasonAtNvidia · 2020-09-17T18:15:23Z

There are Jetson boards with multiple GPU capability, DRIVE units are most common. They have a Xavier SoM and a Turing daughter board.

The linux-4-tegra distribution has a file in /etc/nv_tegra_release that contains the version. And you could check for the existence of /sys/class/tegra-firmware/ (a folder) to verify you are running on a Jetson (these exist in the container, whereas nv_tegra_release does not exist in the container)

pentschev · 2020-09-24T10:10:24Z

There are Jetson boards with multiple GPU capability, DRIVE units are most common. They have a Xavier SoM and a Turing daughter board.

Sorry for the late reply here @JasonAtNvidia , when you say multiple GPU capability you're saying that you can address each process with CUDA_VISIBLE_DEVICES=0, CUDA_VISIBLE_DEVICES=1, and so on? Or how do you choose which GPU the application should use?

The linux-4-tegra distribution has a file in /etc/nv_tegra_release that contains the version. And you could check for the existence of /sys/class/tegra-firmware/ (a folder) to verify you are running on a Jetson (these exist in the container, whereas nv_tegra_release does not exist in the container)

As long as we can choose each GPU correctly, these should work for us to detect the platform correctly so we can work around the current NVML workaround. As soon as you confirm we can indeed use CUDA_VISIBLE_DEVICES for each Dask worker I can submit a PR to address this.

JasonAtNvidia · 2020-09-24T19:06:33Z

@pentschev
Yes, Jetson devices respond to the CUDA_VISIBLE_DEVICES environment variable.

I do not have a Jetson device to test multiple GPUs with, but I am able to verify that CUDA_VISIBLE_DEVICES=0 is successful and CUDA_VISIBLE_DEVICES=1 results in an error that no device is found. I will try to find a multiple GPU device to test with.

pentschev · 2020-09-25T21:06:49Z

@JasonAtNvidia I just pushed #402 , this should work with Tegra, but I don't have access to a Tegra device to test, it would be great if you could test it when you have a chance.

JasonAtNvidia · 2020-09-25T22:52:19Z

@pentschev
I think your patch is good. It builds and loads on the Jetson device, and I think these are the 3 functions you touched with the patch.

>>> dask_cuda.utils.get_gpu_count()
1
>>> dask_cuda.utils._is_tegra()
True
>>> dask_cuda.utils.get_device_total_memory()
16582901760

pentschev · 2020-09-28T09:38:41Z

@JasonAtNvidia those are the correct functions. It would be interesting to know if you can go any further to do some Dask computation as well, but as I mentioned before, you won't see any benefits in using dask-cuda with a single GPU vs just using the library (e.g., CuPy, cuDF, etc.) you're trying to compute with alone.

github-actions · 2021-02-16T19:08:54Z

This issue has been marked stale due to no recent activity in the past 30d. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be marked rotten if there is no activity in the next 60d.

github-actions · 2021-05-17T19:10:28Z

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

pentschev linked a pull request Sep 25, 2020 that will close this issue

Skip pyNVML to support Tegra devices #402

Open

pentschev added the feature request New feature or request label Jan 8, 2021

github-actions bot added the inactive-30d label Feb 16, 2021

github-actions bot added the inactive-90d label May 17, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pyNVML won't work on a Jetson, is there a workaround #400

pyNVML won't work on a Jetson, is there a workaround #400

JasonAtNvidia commented Sep 16, 2020

quasiben commented Sep 16, 2020

JasonAtNvidia commented Sep 16, 2020

jakirkham commented Sep 16, 2020

quasiben commented Sep 16, 2020

JasonAtNvidia commented Sep 16, 2020

pentschev commented Sep 16, 2020

JasonAtNvidia commented Sep 17, 2020

pentschev commented Sep 24, 2020

JasonAtNvidia commented Sep 24, 2020

pentschev commented Sep 25, 2020

JasonAtNvidia commented Sep 25, 2020

pentschev commented Sep 28, 2020

github-actions bot commented Feb 16, 2021

github-actions bot commented May 17, 2021

pyNVML won't work on a Jetson, is there a workaround #400

pyNVML won't work on a Jetson, is there a workaround #400

Comments

JasonAtNvidia commented Sep 16, 2020

quasiben commented Sep 16, 2020

JasonAtNvidia commented Sep 16, 2020

jakirkham commented Sep 16, 2020

quasiben commented Sep 16, 2020

JasonAtNvidia commented Sep 16, 2020

pentschev commented Sep 16, 2020

JasonAtNvidia commented Sep 17, 2020

pentschev commented Sep 24, 2020

JasonAtNvidia commented Sep 24, 2020

pentschev commented Sep 25, 2020

JasonAtNvidia commented Sep 25, 2020

pentschev commented Sep 28, 2020

github-actions bot commented Feb 16, 2021

github-actions bot commented May 17, 2021