Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Cutlass python does not detect GPU #1919

Open
IzanCatalan opened this issue Nov 5, 2024 · 5 comments
Open

[BUG] Cutlass python does not detect GPU #1919

IzanCatalan opened this issue Nov 5, 2024 · 5 comments
Labels
? - Needs Triage bug Something isn't working

Comments

@IzanCatalan
Copy link

Describe the bug
I am trying to use Cutlass Python and build it from source.
My environment is formed by Ubuntu 18.04, cuda 11.8, GPU Nvidia Tesla V100 volta, python3.10, make 3.19 and GCC version 9.4.0.
I successfully built and compiled Cutlass following the guidelines here. However, I now desire to compile cutlass Python to use pytorch with a cutlass. However, when following the guidelines in /python, it fails because it does not detect the GPU.

Steps/Code to reproduce bug
I have executed pip install -e . in the root directory /cutlass, and it works fine because pip detects and compiles cutlass; in fact, if I run pip list | grep nvidia, it shows nvidia-cutlass 3.6.0.0 . However, when I run a test, or this basic example fails:

`import cutlass
import numpy as np

plan = cutlass.op.Gemm(element=np.float16, layout=cutlass.LayoutType.RowMajor)
A, B, C, D = [np.ones((1024, 1024), dtype=np.float16) for i in range(4)]
plan.run(A, B, C, D)`

And the output error is:
File "/mnt/beegfs/gap/[email protected]/cutlass/test/python/cutlass/conv2d/test.py", line 4, in <module> plan = cutlass.op.Gemm(element=np.float16, layout=cutlass.LayoutType.RowMajor) File "/mnt/beegfs/gap/[email protected]/cutlass/python/cutlass/op/gemm.py", line 224, in __init__ super().__init__(cc=cc, kernel_cc=kernel_cc) File "/mnt/beegfs/gap/[email protected]/cutlass/python/cutlass/op/op.py", line 72, in __init__ self.cc = cc if cc is not None else device_cc() File "/mnt/beegfs/gap/[email protected]/cutlass/python/cutlass/backend/utils/device.py", line 77, in device_cc device = cutlass.device_id() File "/mnt/beegfs/gap/[email protected]/cutlass/python/cutlass/__init__.py", line 176, in device_id initialize_cuda_context() File "/mnt/beegfs/gap/[email protected]/cutlass/python/cutlass/__init__.py", line 163, in initialize_cuda_context raise RuntimeError(f"cudaFree failed with error {err}") RuntimeError: cudaFree failed with error 3

Or if I run the tests:

`======================================================================
ERROR: conv2d_sm80 (unittest.loader._FailedTest)

ImportError: Failed to import test module: conv2d_sm80
Traceback (most recent call last):
File "/usr/lib/python3.10/unittest/loader.py", line 436, in _find_test_path
module = self._get_module_from_name(name)
File "/usr/lib/python3.10/unittest/loader.py", line 377, in _get_module_from_name
import(name)
File "/mnt/beegfs/gap/[email protected]/cutlass/test/python/cutlass/conv2d/conv2d_sm80.py", line 50, in
@unittest.skipIf(device_cc() < cc, 'Device compute capability is invalid for SM80 tests.')
File "/mnt/beegfs/gap/[email protected]/cutlass/python/cutlass/backend/utils/device.py", line 77, in device_cc
device = cutlass.device_id()
File "/mnt/beegfs/gap/[email protected]/cutlass/python/cutlass/init.py", line 176, in device_id
initialize_cuda_context()
File "/mnt/beegfs/gap/[email protected]/cutlass/python/cutlass/init.py", line 163, in initialize_cuda_context
raise RuntimeError(f"cudaFree failed with error {err}")
RuntimeError: cudaFree failed with error 3


Ran 1 test in 0.002s

FAILED (errors=1)
Traceback (most recent call last):
File "/mnt/beegfs/gap/[email protected]/cutlass/test/python/cutlass/conv2d/run_all_tests.py", line 44, in
raise Exception('Test cases failed')
Exception: Test cases failed
[email protected]@altek1:~/cutlass/test/python/cutlass/conv2d$ python3.10 test.py
Traceback (most recent call last):
File "/mnt/beegfs/gap/[email protected]/cutlass/test/python/cutlass/conv2d/test.py", line 4, in
plan = cutlass.op.Gemm(element=np.float16, layout=cutlass.LayoutType.RowMajor)
File "/mnt/beegfs/gap/[email protected]/cutlass/python/cutlass/op/gemm.py", line 224, in init
super().init(cc=cc, kernel_cc=kernel_cc)
File "/mnt/beegfs/gap/[email protected]/cutlass/python/cutlass/op/op.py", line 72, in init
self.cc = cc if cc is not None else device_cc()
File "/mnt/beegfs/gap/[email protected]/cutlass/python/cutlass/backend/utils/device.py", line 77, in device_cc
device = cutlass.device_id()
File "/mnt/beegfs/gap/[email protected]/cutlass/python/cutlass/init.py", line 176, in device_id
initialize_cuda_context()
File "/mnt/beegfs/gap/[email protected]/cutlass/python/cutlass/init.py", line 163, in initialize_cuda_context
raise RuntimeError(f"cudaFree failed with error {err}")
RuntimeError: cudaFree failed with error 3`

I would like to know if I omitted any step because I didn't modify the cuda or path variables. It was automatically detected by cmake. I just did:

$ mkdir build && cd build $ cmake .. -DCUTLASS_NVCC_ARCHS=70 $ make -j$(nproc)

And It worked fine.

Any help would be appreciated.

@IzanCatalan IzanCatalan added ? - Needs Triage bug Something isn't working labels Nov 5, 2024
@jackkosaian
Copy link
Contributor

Can you please list the version of cuda-python installed on your system?

@IzanCatalan
Copy link
Author

IzanCatalan commented Nov 5, 2024

@jackkosaian I show you everything I could find installed in Python related to Cuda, Nvidia, however my cuda python is 12.6.1:

`[email protected]@altek1:~$ pip3.10 list | grep nvidia
nvidia-cublas-cu11 11.11.3.6
nvidia-cublas-cu12 12.1.3.1
nvidia-cuda-cupti-cu11 11.8.87
nvidia-cuda-cupti-cu12 12.1.105
nvidia-cuda-nvrtc-cu11 11.8.89
nvidia-cuda-nvrtc-cu12 12.1.105
nvidia-cuda-runtime-cu11 11.8.89
nvidia-cuda-runtime-cu12 12.1.105
nvidia-cudnn-cu11 9.1.0.70
nvidia-cudnn-cu12 8.9.2.26
nvidia-cufft-cu11 10.9.0.58
nvidia-cufft-cu12 11.0.2.54
nvidia-curand-cu11 10.3.0.86
nvidia-curand-cu12 10.3.2.106
nvidia-cusolver-cu11 11.4.1.48
nvidia-cusolver-cu12 11.4.5.107
nvidia-cusparse-cu11 11.7.5.86
nvidia-cusparse-cu12 12.1.0.106
nvidia-cutlass 3.6.0.0 /mnt/beegfs/gap/[email protected]/cutlass
nvidia-nccl-cu11 2.20.5
nvidia-nccl-cu12 2.18.1
nvidia-nvjitlink-cu12 12.4.127
nvidia-nvtx-cu11 11.8.86
nvidia-nvtx-cu12 12.1.105

[notice] A new release of pip is available: 24.0 -> 24.3.1
[notice] To update, run: python3.10 -m pip install --upgrade pip
[email protected]@altek1:~$ pip3.10 list | grep cuda
cuda-python 12.6.1
nvidia-cuda-cupti-cu11 11.8.87
nvidia-cuda-cupti-cu12 12.1.105
nvidia-cuda-nvrtc-cu11 11.8.89
nvidia-cuda-nvrtc-cu12 12.1.105
nvidia-cuda-runtime-cu11 11.8.89
nvidia-cuda-runtime-cu12 12.1.105`

Executing nvidia-smi and nvcc --version:

Image

@jackkosaian
Copy link
Contributor

Thanks.

Does the issue also occur if you install the version of the CUTLASS Python interface available on PyPI?

This can be installed via:

pip install nvidia-cutlass

(You'll probably need to uninstall any version of the CUTLASS Python interface you may have previously installed via pip install -e .)

@IzanCatalan
Copy link
Author

@jackkosaian Apparently, it was something related to python3.10 because I got the same error even with the command you told me. When I repeated all the steps but with pip3 and python3.8, the error did not appear; however, when running the following example, the GPU detects the program, but it seems to only use it partially with only 1-6% usage. With all tests, it is even lower, with 0-1% and only 300MB out of 32G. I wonder if this is normal.

`import cutlass
import numpy as np

plan = cutlass.op.Gemm(element=np.float16, layout=cutlass.LayoutType.RowMajor)
A, B, C, D = [np.ones((1024, 1024), dtype=np.float16) for i in range(4)]
plan.run(A, B, C, D)`

@jackkosaian
Copy link
Contributor

Interesting on the Python 3.10 vs. 3.8 finding.

Regarding GPU utilization: the GEMM you're running and those in unit tests are not large enough to show high utilization of the GPU via nvidia-smi (assuming that's how you're measuring utilization and memory utilization).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
? - Needs Triage bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants