You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Numeri opened this issue
May 31, 2024
· 4 comments
Labels
buildbuild issues; typically submitted using templateep:CUDAissues related to the CUDA execution providerstaleissues that have not been addressed in a while; categorized by a bot
I am trying to add a custom Triton kernel to ONNX Runtime as an operator. This works, but whenever I call the operator, I get the following CUDA error (Illegal Memory Access):
2024-05-31 15:58:30.844121139 [E:onnxruntime:Default, cuda_call.cc:118 CudaCall] CUDA failure 700: an illegal memory access was encountered ; GPU=0 ; hostname=998ab211f19f ; file=/code/onnxruntime/core/providers/cuda/gpu_data_transfer.cc ; line=73 ; expr=cudaMemcpyAsync(dst_data, src_data, bytes, cudaMemcpyDeviceToHost, static_cast<cudaStream_t>(stream.GetHandle()));
2024-05-31 15:58:30.844186034 [E:onnxruntime:Default, cuda_call.cc:118 CudaCall] CUDA failure 700: an illegal memory access was encountered ; GPU=0 ; hostname=998ab211f19f ; file=/code/onnxruntime/core/providers/cuda/cuda_execution_provider.cc ; line=446 ; expr=cudaStreamSynchronize(static_cast<cudaStream_t>(stream_));
This occurs with both IOBinding and normal inference (the exact error is slightly different, but it's still an illegal memory access).
I have reduced my code to a minimal working example (the Triton kernel is essentially just lambda x: -x) and put it into this draft PR, which also contains a few small fixes and more extensive documentation.
This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.
git clone https://github.com/Numeri/onnxruntime.git cd onnxruntime git checkout numeri/minimal_triton_kernel ./build.sh --update --build --config RelWithDebInfo --skip_submodule_sync --build_shared_lib --parallel 20 --build_wheel --use_triton_kernel --use_cuda --cuda_home $CUDA_HOME --cudnn_home $CUDNN_HOME pip install onnxruntime-1.19.0/build/Linux/RelWithDebInfo/dist/onnxruntime_gpu-1.19.0-cp38-cp38-linux_x86_64.whl python make_graph.py python run_graph.py
I have error like this:
2024-09-06 16:25:38.797108890 [E:onnxruntime:, sequential_executor.cc:516 ExecuteKernel] Non-zero status code returned while running MyTritonKernel node. Name:'' Status Message: Launching kernel failed. too many resources requested for launch Traceback (most recent call last): File "run_graph.py", line 30, in <module> session.run_with_iobinding(binding) File "/home/zhen1.zhang/virtual_env/vir_onnx_py3.8/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 331, in run_with_iobinding self._sess.run_with_iobinding(iobinding._iobinding, run_options) RuntimeError: Error in execution: Non-zero status code returned while running MyTritonKernel node. Name:'' Status Message: Launching kernel failed. too many resources requested for launch
@zhangzhen507 Interesting, I haven't seen that. Just so you're aware, I'm not completely confident I left that branch in a working state. I did manage to fix the error mentioned in this thread, but need to do a little cleanup before I feel confident the Triton kernels are working properly.
buildbuild issues; typically submitted using templateep:CUDAissues related to the CUDA execution providerstaleissues that have not been addressed in a while; categorized by a bot
Describe the issue
I am trying to add a custom Triton kernel to ONNX Runtime as an operator. This works, but whenever I call the operator, I get the following CUDA error (Illegal Memory Access):
This occurs with both IOBinding and normal inference (the exact error is slightly different, but it's still an illegal memory access).
I have reduced my code to a minimal working example (the Triton kernel is essentially just
lambda x: -x
) and put it into this draft PR, which also contains a few small fixes and more extensive documentation.I believe the issue is specifically in how I pass the CUDA stream to
onnxruntime::cuda::LaunchTritonKernel
[here]:(https://github.com/microsoft/onnxruntime/pull/20883/files#diff-3ed25bb54cb594743621055f0b541d6b93c6792a49da3a9a1bb5a65d73abf22eR60-R72):but I may be wrong about that.
I understand that this is my code, not Microsoft's, but think that my PR would be a good contribution to ORT if I had a little help to fix this issue.
Urgency
Medium, not super urgent but I'd love some help :)
Target platform
CUDA
Build script
In
$PATH_TO_PROVIDED_SCRIPTS
place these two scripts (one for making the ONNX graph with the test operator, one for running it).make_graph.py
:run_graph.py
Error / output
Visual Studio Version
No response
GCC / Compiler Version
Using the dockerfile's compiler version
The text was updated successfully, but these errors were encountered: