[Feature Request] Add CUDA kernel for the ScatterElements operator in opset 18 #18381

martinResearch · 2023-11-09T22:33:31Z

Describe the feature request

It seems that the operator ScatterElements in not implemented in CUDA when using opset 16,17 or 18 and "add" reduction.
We get a message CUDA kernel not found in registries for Op type: ScatterElements node name: /ScatterElements in the log when loading the onnx model and "CUDAExecutionProvider" in the profiling file.

Note that the operator ScatterElements is available when using opset 15 but provides wrong results (see onnx/onnx#3484)

Here is some minimal python code to reproduce the problem using torch==2.0.0+cu118 or torch==2.1.0+cu118 and onnxruntime-gpu==1.16.2 using a NVIDIA GeForce GTX 1050

import io
import json

import numpy as np
import onnxruntime
import torch
import torch.nn as nn


class ScatterAdd(nn.Module):
    """Point cloud renderer"""

    def __init__(self, length: int):
        super(ScatterAdd, self).__init__()
        self.length = length

    def forward(self, indices: torch.Tensor, weights: torch.Tensor) -> torch.Tensor:
        result = torch.zeros((self.length), device="cuda", dtype=torch.float)
        result = result.scatter_add(0, indices, weights)
        return result


def main():
    opset_version = 16
    onnx_provider = "CUDAExecutionProvider"

    indices = torch.Tensor([0, 0, 1, 2]).long().cuda()
    weights = torch.Tensor([1.0, 3.0, 5.0, 7.0]).cuda()
    scatter_add = ScatterAdd(length=3)
    result = scatter_add(indices=indices, weights=weights)
    assert np.allclose(result.cpu().numpy(), [4.0, 5.0, 7.0])

    bytes_io = io.BytesIO()
    torch.onnx.export(
        scatter_add, (indices, weights), bytes_io, opset_version=opset_version, input_names=["indices", "weights"]
    )
    onnxruntime.set_default_logger_severity(1)
    sess_options = onnxruntime.SessionOptions()
    sess_options.enable_profiling = True
    ort_session = onnxruntime.InferenceSession(
        bytes_io.getvalue(),
        providers=[
            onnx_provider,
        ],
        sess_options=sess_options,
    )
    # when using CUDAExecutionProvider with opset_version in 16, 17,=ir 18 getting in the log:
    # CUDA kernel not found in registries for Op type: ScatterElements node name: /ScatterElements

    numpy_inputs = {
        "indices": np.array([0, 0, 1, 2], dtype=np.int64),
        "weights": np.array([1.0, 3.0, 5.0, 7.0], dtype=np.float32),
    }
    result = ort_session.run(None, numpy_inputs)

    prof_file = ort_session.end_profiling()

    with open(prof_file) as f:
        sess_time = json.load(f)

    # fails when using opset_version=15 with both CUDAExecutionProvider
    assert np.allclose(result, [4.0, 5.0, 7.0])

    # fails when using opset_version=16 or 17 or 18 with both CUDAExecutionProvider
    assert sess_time[3]["args"]["provider"] == "CUDAExecutionProvider"
  

if __name__ == "__main__":
    main()

Describe scenario use case

This is used in an image processing pipeline.

The text was updated successfully, but these errors were encountered:

martinResearch · 2023-11-10T13:09:48Z

When printing the onnx model using

    torch.onnx.export(
        scatter_add, (indices, weights), "model.onnx", opset_version=opset_version, input_names=["indices", "weights"]
    )
    with open("model.onnx", "rb") as f:
        model = onnx.load(f)

I get

ir_version: 8
opset_import {
  version: 16
}
producer_name: "pytorch"
producer_version: "2.1.0"
graph {
  node {
    output: "onnx::ScatterElements_2"
    name: "Constant_0"
    op_type: "Constant"
    attribute {
      name: "value"
      type: TENSOR
      t {
        dims: 3
        data_type: 1
        raw_data: "\000\000\000\000\000\000\000\000\000\000\000\000"
      }
    }
  }
  node {
    input: "onnx::ScatterElements_2"
    input: "indices"
    input: "weights"
    output: "3"
    name: "/ScatterElements"
    op_type: "ScatterElements"
    attribute {
      name: "axis"
      type: INT
      i: 0
    }
    attribute {
      name: "reduction"
      type: STRING
      s: "add"
    }
  }
  name: "main_graph"
  input {
    name: "indices"
    type {
      tensor_type {
        elem_type: 7
        shape {
          dim {
            dim_value: 4
          }
        }
      }
    }
  }
  input {
    name: "weights"
    type {
      tensor_type {
        elem_type: 1
        shape {
          dim {
            dim_value: 4
          }
        }
      }
    }
  }
  output {
    name: "3"
    type {
      tensor_type {
        elem_type: 1
        shape {
          dim {
            dim_value: 3
          }
        }
      }
    }
  }
}

The exported onnx model seems valid.

martinResearch · 2023-11-13T10:51:58Z

The ScatterElement operator is listed here and here with lines

BuildKernelCreateInfo<ONNX_OPERATOR_VERSIONED_KERNEL_CLASS_NAME(kCudaExecutionProvider, kOnnxDomain, 11, 12, ScatterElements)>,

and

BuildKernelCreateInfo<ONNX_OPERATOR_KERNEL_CLASS_NAME(kCudaExecutionProvider, kOnnxDomain, 13, ScatterElements)>,

which seems to indicated it should be available for any opset greater of equal to 11, thus I it seems strange it seems in practice only available for opsets 11,12,13,14 and 15

anjandeepsahni · 2024-03-22T17:39:47Z

Seconding this request. Currently if we use scatter add with opset 16, onnxruntime runs it on CPUExecutionProvider which is very slow for large inputs and the runtime just seems to increase linearly with batched inputs. @martinResearch wondering if you already found a solution for this?

pranavsharma · 2024-03-22T19:27:22Z

Feel free to contribute. We welcome external contributions.

anjandeepsahni · 2024-03-22T20:25:46Z

Thanks @pranavsharma . I am not an expert at the internal workings of ONNXRuntime. If I could get some guidance on how to fix this I am happy to create a PR. 😄

anjandeepsahni · 2024-03-25T00:26:05Z

#19198 This PR seems to have added support for ScatterElements in opset 13,15 and 18. But I am not sure why opsets 16 and 17 were skipped.

Unfortunately, PyTorch does not support opset 18 with torch.onnx.export and torch.onnx.dynamo_export is beta.

martinResearch added the feature request request for unsupported feature or enhancement label Nov 9, 2023

github-actions bot added the ep:CUDA issues related to the CUDA execution provider label Nov 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Add CUDA kernel for the ScatterElements operator in opset 18 #18381

[Feature Request] Add CUDA kernel for the ScatterElements operator in opset 18 #18381

martinResearch commented Nov 9, 2023 •

edited

Loading

martinResearch commented Nov 10, 2023

martinResearch commented Nov 13, 2023

anjandeepsahni commented Mar 22, 2024 •

edited

Loading

pranavsharma commented Mar 22, 2024

anjandeepsahni commented Mar 22, 2024

anjandeepsahni commented Mar 25, 2024

[Feature Request] Add CUDA kernel for the ScatterElements operator in opset 18 #18381

[Feature Request] Add CUDA kernel for the ScatterElements operator in opset 18 #18381

Comments

martinResearch commented Nov 9, 2023 • edited Loading

Describe the feature request

Describe scenario use case

martinResearch commented Nov 10, 2023

martinResearch commented Nov 13, 2023

anjandeepsahni commented Mar 22, 2024 • edited Loading

pranavsharma commented Mar 22, 2024

anjandeepsahni commented Mar 22, 2024

anjandeepsahni commented Mar 25, 2024

martinResearch commented Nov 9, 2023 •

edited

Loading

anjandeepsahni commented Mar 22, 2024 •

edited

Loading