Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cuKernelGetFunction Segmentation fault (core dumped) #129

Closed
CallmeZhangChenchen opened this issue Sep 25, 2024 · 8 comments
Closed

cuKernelGetFunction Segmentation fault (core dumped) #129

CallmeZhangChenchen opened this issue Sep 25, 2024 · 8 comments
Labels
awaiting-response Further information is requested

Comments

@CallmeZhangChenchen
Copy link

# cuKernelGetAttribute
print('cuKernelGetAttribute')
info = cuda.cuKernelGetAttribute(cuda.CUfunction_attribute.CU_FUNC_ATTRIBUTE_MAX_THREADS_PER_BLOCK, kernel, cuDevice)
print('cuKernelGetAttribute end, info ', info)


# cuKernelGetFunction
print('cuKernelGetFunction')
cuda.cuKernelGetFunction(kernel)
print('cuKernelGetFunction end, info', info)

cuKernelGetAttribute
cuKernelGetAttribute end, info  (<CUresult.CUDA_SUCCESS: 0>, 1024)
cuKernelGetFunction
Segmentation fault (core dumped)
@github-actions github-actions bot added the triage Needs the team's attention label Sep 25, 2024
@leofang
Copy link
Member

leofang commented Sep 25, 2024

What's the CUDA Python version? CUDA Driver version (as ported via nvidia-smi)?

@CallmeZhangChenchen
Copy link
Author

@leofang Hello!

Python 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] on linux

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Thu_Mar_28_02:18:24_PDT_2024
Cuda compilation tools, release 12.4, V12.4.131
Build cuda_12.4.r12.4/compiler.34097967_0

Thu Sep 26 01:46:39 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.4     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA A30                     Off | 00000000:17:00.0 Off |                    0 |
| N/A   26C    P0              29W / 165W |     36MiB / 24576MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA A30                     Off | 00000000:31:00.0 Off |                    0 |
| N/A   29C    P0              30W / 165W |  23473MiB / 24576MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   2  NVIDIA A30                     Off | 00000000:B1:00.0 Off |                    0 |
| N/A   29C    P0              32W / 165W |  23471MiB / 24576MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   3  NVIDIA A30                     Off | 00000000:CA:00.0 Off |                    0 |
| N/A   27C    P0              26W / 165W |     36MiB / 24576MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
+---------------------------------------------------------------------------------------+

@leofang
Copy link
Member

leofang commented Sep 26, 2024

Hi @CallmeZhangChenchen you still have not provided the CUDA Python version. It can be accessed via either python -c "from cuda import __version__; print(__version__)" (but this will change later, see #75) or pip list | grep cuda.

@vzhurba01
Copy link
Collaborator

I took a cursory glance and wasn't able to reproduce with the latest CUDA Python. However as I was setting up my local test, I made a mistake that caused a segfault: I mixed up how I was loading my data (cuModuleLoadData vs. cuLibraryLoadData) and my downstream API segfaulted.

The following call order is what ended up working:

  1. cuModuleLoadData -> cuModuleGetFunction -> cuFuncGetAttribute
  2. cuLibraryLoadFromFile -> cuLibraryGetKernel -> cuKernelGetAttribute -> cuKernelGetFunction

Could you have had a similar mix up? If not, then a minimal repro would help us a lot in seeing why I wasn't able to reproduce your issue.

@CallmeZhangChenchen
Copy link
Author

Hi @vzhurba01 @leofang Thank you for your help.

cuda-python 12.6.0
nvidia-cuda-runtime-cu12 12.6.37
nvidia-dali-cuda120 1.36.0

I used it wrong. I used cuModuleLoadData -> cuModuleGetFunction -> cuKernelGetAttribute -> cuKernelGetFunction

However, I don't know how to use cuLibraryLoadFromFile and how to change parameters.
I can only see the parameters of the function from the document, but I don't know how to define this parameter or how to use these functions,
So is there any data or code that I can reuse that can easily write my code, such as these functions cuKernelGetAttribute
cuKernelGetFunction, cuKernelSetAttribute, cuKernelSetCacheConfig, cuLibraryGetGlobal, cuLibraryGetKernel, cuLibraryGetManaged, cuLibraryGetModule, cuLibraryGetUnifiedFunction, cuLibraryLoadData, cuLibraryLoadFromFile, cuLibraryUnload, cuLaunchKernelEx,
cuLaunchKernelEx_ptsz, cuMemAdvise_v2, cuMemPrefetchAsync_v2, cuMemPrefetchAsync_v2_ptsz, cuMemGetHandleForAddressRange, cuMipmappedArrayGetMemoryRequirements, cuModuleGetLoadingMode, cuOccupancyMaxActiveClusters, cuOccupancyMaxPotentialClusterSize,
cuStreamBatchMemOp_v2, cuStreamBatchMemOp_v2_ptsz, cuStreamWaitValue32_v2,
cuStreamWaitValue32_v2_ptsz, cuStreamWaitValue64_v2, cuStreamWaitValue64_v2_ptsz,
cuStreamWriteValue32_v2, cuStreamWriteValue32_v2_ptsz, cuStreamWriteValue64_v2, cuStreamWriteValue64_v2_ptsz,
cuStreamGetId, cuStreamGetId_ptsz, cuTensorMapEncodeIm2col, cuTensorMapEncodeTiled, cuTensorMapReplaceAddress.

@leofang
Copy link
Member

leofang commented Oct 4, 2024

Thanks, @vzhurba01. That's a reasonable guess. Glad it helped @CallmeZhangChenchen!

@CallmeZhangChenchen For this particular use case, please check out the devblog https://developer.nvidia.com/blog/cuda-context-independent-module-loading, sometimes our devblogs come out sooner than official documentations 😅

Generally speaking, however, the documentation for the entire CUDA API surface is unfortunately pretty lacking and CUDA Python is not the only victim; the same applies to CUDA C/C++ too. Right now, CUDA Python's API Reference is largely copied and pasted from the CUDA Driver, Runtime, and NVRTC API references (through parsing the same Doxygen docstrings in the CUDA headers that are also used to generate the C API references), with minor changes.

For actually learning how to use the C APIs (and their Python bindings) I would recommend:

  1. The official CUDA Programming Guide
  2. The official CUDA C samples
  3. Search NVIDIA's Developer Blogs. Sometimes the devblogs could contain richer and more useful information than what's covered elsewhere.

The team is working on a more comprehensive CUDA Python user guide (cc @aterrel @nv-kriehl) and a few preview chapters are already available in https://github.com/NVIDIA/accelerated-computing-hub. In the long term, the idea is to provide a pythonic CUDA module cuda.core (#70) for accessing these functionalities without you worrying about how to use C APIs/bindings anymore 🙂 For example, the API combo

2. cuLibraryLoadFromFile -> cuLibraryGetKernel -> cuKernelGetAttribute -> cuKernelGetFunction

is already covered in the cuda.core prototype (#87). We hope to make a beta release very soon!

@leofang leofang added awaiting-response Further information is requested and removed triage Needs the team's attention labels Oct 4, 2024
@leofang
Copy link
Member

leofang commented Oct 4, 2024

Since the question was answered/addressed, let us close this issue.

@leofang leofang closed this as completed Oct 4, 2024
@leofang
Copy link
Member

leofang commented Oct 4, 2024

FYI, you have CUDA driver 12.4 (as reported by nvidia-smi) but cuda-python 12.6. Bear in mind that driver API bindings provided by cuda-python require a functional CUDA driver, so if you're using cutting-edge APIs you'd also need to update to the driver version that first provides the API. In this case this requirement was met since the contextless loading APIs started back in 12.0. Just in case it was not clear to you for future needs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting-response Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants