cuKernelGetFunction Segmentation fault (core dumped) #129

CallmeZhangChenchen · 2024-09-25T03:46:59Z

# cuKernelGetAttribute
print('cuKernelGetAttribute')
info = cuda.cuKernelGetAttribute(cuda.CUfunction_attribute.CU_FUNC_ATTRIBUTE_MAX_THREADS_PER_BLOCK, kernel, cuDevice)
print('cuKernelGetAttribute end, info ', info)


# cuKernelGetFunction
print('cuKernelGetFunction')
cuda.cuKernelGetFunction(kernel)
print('cuKernelGetFunction end, info', info)

cuKernelGetAttribute
cuKernelGetAttribute end, info  (<CUresult.CUDA_SUCCESS: 0>, 1024)
cuKernelGetFunction
Segmentation fault (core dumped)

The text was updated successfully, but these errors were encountered:

leofang · 2024-09-25T16:57:40Z

What's the CUDA Python version? CUDA Driver version (as ported via nvidia-smi)?

CallmeZhangChenchen · 2024-09-26T01:47:28Z

@leofang Hello!

Python 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] on linux

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Thu_Mar_28_02:18:24_PDT_2024
Cuda compilation tools, release 12.4, V12.4.131
Build cuda_12.4.r12.4/compiler.34097967_0

Thu Sep 26 01:46:39 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.4     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA A30                     Off | 00000000:17:00.0 Off |                    0 |
| N/A   26C    P0              29W / 165W |     36MiB / 24576MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA A30                     Off | 00000000:31:00.0 Off |                    0 |
| N/A   29C    P0              30W / 165W |  23473MiB / 24576MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   2  NVIDIA A30                     Off | 00000000:B1:00.0 Off |                    0 |
| N/A   29C    P0              32W / 165W |  23471MiB / 24576MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   3  NVIDIA A30                     Off | 00000000:CA:00.0 Off |                    0 |
| N/A   27C    P0              26W / 165W |     36MiB / 24576MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
+---------------------------------------------------------------------------------------+

leofang · 2024-09-26T13:48:26Z

Hi @CallmeZhangChenchen you still have not provided the CUDA Python version. It can be accessed via either python -c "from cuda import __version__; print(__version__)" (but this will change later, see #75) or pip list | grep cuda.

vzhurba01 · 2024-09-26T23:06:37Z

I took a cursory glance and wasn't able to reproduce with the latest CUDA Python. However as I was setting up my local test, I made a mistake that caused a segfault: I mixed up how I was loading my data (cuModuleLoadData vs. cuLibraryLoadData) and my downstream API segfaulted.

The following call order is what ended up working:

cuModuleLoadData -> cuModuleGetFunction -> cuFuncGetAttribute
cuLibraryLoadFromFile -> cuLibraryGetKernel -> cuKernelGetAttribute -> cuKernelGetFunction

Could you have had a similar mix up? If not, then a minimal repro would help us a lot in seeing why I wasn't able to reproduce your issue.

CallmeZhangChenchen · 2024-09-27T05:36:31Z

Hi @vzhurba01 @leofang Thank you for your help.

cuda-python 12.6.0
nvidia-cuda-runtime-cu12 12.6.37
nvidia-dali-cuda120 1.36.0

I used it wrong. I used cuModuleLoadData -> cuModuleGetFunction -> cuKernelGetAttribute -> cuKernelGetFunction

However, I don't know how to use cuLibraryLoadFromFile and how to change parameters.
I can only see the parameters of the function from the document, but I don't know how to define this parameter or how to use these functions,
So is there any data or code that I can reuse that can easily write my code, such as these functions cuKernelGetAttribute
cuKernelGetFunction, cuKernelSetAttribute, cuKernelSetCacheConfig, cuLibraryGetGlobal, cuLibraryGetKernel, cuLibraryGetManaged, cuLibraryGetModule, cuLibraryGetUnifiedFunction, cuLibraryLoadData, cuLibraryLoadFromFile, cuLibraryUnload, cuLaunchKernelEx,
cuLaunchKernelEx_ptsz, cuMemAdvise_v2, cuMemPrefetchAsync_v2, cuMemPrefetchAsync_v2_ptsz, cuMemGetHandleForAddressRange, cuMipmappedArrayGetMemoryRequirements, cuModuleGetLoadingMode, cuOccupancyMaxActiveClusters, cuOccupancyMaxPotentialClusterSize,
cuStreamBatchMemOp_v2, cuStreamBatchMemOp_v2_ptsz, cuStreamWaitValue32_v2,
cuStreamWaitValue32_v2_ptsz, cuStreamWaitValue64_v2, cuStreamWaitValue64_v2_ptsz,
cuStreamWriteValue32_v2, cuStreamWriteValue32_v2_ptsz, cuStreamWriteValue64_v2, cuStreamWriteValue64_v2_ptsz,
cuStreamGetId, cuStreamGetId_ptsz, cuTensorMapEncodeIm2col, cuTensorMapEncodeTiled, cuTensorMapReplaceAddress.

leofang · 2024-10-04T02:23:53Z

Thanks, @vzhurba01. That's a reasonable guess. Glad it helped @CallmeZhangChenchen!

@CallmeZhangChenchen For this particular use case, please check out the devblog https://developer.nvidia.com/blog/cuda-context-independent-module-loading, sometimes our devblogs come out sooner than official documentations 😅

Generally speaking, however, the documentation for the entire CUDA API surface is unfortunately pretty lacking and CUDA Python is not the only victim; the same applies to CUDA C/C++ too. Right now, CUDA Python's API Reference is largely copied and pasted from the CUDA Driver, Runtime, and NVRTC API references (through parsing the same Doxygen docstrings in the CUDA headers that are also used to generate the C API references), with minor changes.

For actually learning how to use the C APIs (and their Python bindings) I would recommend:

The official CUDA Programming Guide
The official CUDA C samples
Search NVIDIA's Developer Blogs. Sometimes the devblogs could contain richer and more useful information than what's covered elsewhere.

The team is working on a more comprehensive CUDA Python user guide (cc @aterrel @nv-kriehl) and a few preview chapters are already available in https://github.com/NVIDIA/accelerated-computing-hub. In the long term, the idea is to provide a pythonic CUDA module cuda.core (#70) for accessing these functionalities without you worrying about how to use C APIs/bindings anymore 🙂 For example, the API combo

2. cuLibraryLoadFromFile -> cuLibraryGetKernel -> cuKernelGetAttribute -> cuKernelGetFunction

is already covered in the cuda.core prototype (#87). We hope to make a beta release very soon!

leofang · 2024-10-04T02:26:26Z

Since the question was answered/addressed, let us close this issue.

leofang · 2024-10-04T02:39:36Z

FYI, you have CUDA driver 12.4 (as reported by nvidia-smi) but cuda-python 12.6. Bear in mind that driver API bindings provided by cuda-python require a functional CUDA driver, so if you're using cutting-edge APIs you'd also need to update to the driver version that first provides the API. In this case this requirement was met since the contextless loading APIs started back in 12.0. Just in case it was not clear to you for future needs.

github-actions bot added the triage Needs the team's attention label Sep 25, 2024

leofang added awaiting-response Further information is requested and removed triage Needs the team's attention labels Oct 4, 2024

leofang closed this as completed Oct 4, 2024

leofang mentioned this issue Oct 20, 2024

Documentation and example of usage of cuMemExportToShareableHandle #174

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cuKernelGetFunction Segmentation fault (core dumped) #129

cuKernelGetFunction Segmentation fault (core dumped) #129

CallmeZhangChenchen commented Sep 25, 2024

leofang commented Sep 25, 2024

CallmeZhangChenchen commented Sep 26, 2024

leofang commented Sep 26, 2024

vzhurba01 commented Sep 26, 2024

CallmeZhangChenchen commented Sep 27, 2024

leofang commented Oct 4, 2024

leofang commented Oct 4, 2024

leofang commented Oct 4, 2024

cuKernelGetFunction Segmentation fault (core dumped) #129

cuKernelGetFunction Segmentation fault (core dumped) #129

Comments

CallmeZhangChenchen commented Sep 25, 2024

leofang commented Sep 25, 2024

CallmeZhangChenchen commented Sep 26, 2024

leofang commented Sep 26, 2024

vzhurba01 commented Sep 26, 2024

CallmeZhangChenchen commented Sep 27, 2024

leofang commented Oct 4, 2024

leofang commented Oct 4, 2024

leofang commented Oct 4, 2024