-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request] Add CUDA kernel for the ScatterElements operator in opset 18 #18381
Comments
When printing the onnx model using
I get
The exported onnx model seems valid. |
The
and
which seems to indicated it should be available for any opset greater of equal to 11, thus I it seems strange it seems in practice only available for opsets 11,12,13,14 and 15 |
Seconding this request. Currently if we use scatter add with opset 16, onnxruntime runs it on CPUExecutionProvider which is very slow for large inputs and the runtime just seems to increase linearly with batched inputs. @martinResearch wondering if you already found a solution for this? |
Feel free to contribute. We welcome external contributions. |
Thanks @pranavsharma . I am not an expert at the internal workings of ONNXRuntime. If I could get some guidance on how to fix this I am happy to create a PR. 😄 |
#19198 This PR seems to have added support for ScatterElements in opset 13,15 and 18. But I am not sure why opsets 16 and 17 were skipped. Unfortunately, PyTorch does not support opset 18 with |
Describe the feature request
It seems that the operator
ScatterElements
in not implemented in CUDA when using opset 16,17 or 18 and "add" reduction.We get a message
CUDA kernel not found in registries for Op type: ScatterElements node name: /ScatterElements
in the log when loading the onnx model and "CUDAExecutionProvider" in the profiling file.Note that the operator
ScatterElements
is available when using opset 15 but provides wrong results (see onnx/onnx#3484)Here is some minimal python code to reproduce the problem using
torch==2.0.0+cu118
ortorch==2.1.0+cu118
andonnxruntime-gpu==1.16.2
using a NVIDIA GeForce GTX 1050Describe scenario use case
This is used in an image processing pipeline.
The text was updated successfully, but these errors were encountered: