[Feature Request] Request grid_sample 5D support 🌟 #21382

juntaosun · 2024-07-17T04:13:59Z

Describe the feature request

Many models now use grid_sample 5D calculations, but the export onnx does not seem to support it yet.
It now works on the CPU,
which makes the inference speed very slow compared to the original torch.nn.functional.grid_sample.
Searching for issues has mentioned this issue many times in the past. As of 2024-07-17, the latest onnxruntime still does not support it.
In addition, I have seen an implementation in the branch.

7c0ae44

Hope to support it as soon as possible. I think it will be great for most developers.

Describe scenario use case

I believe that many people need it ( Cuda ). Thank you for your efforts and excellent work. ❤️

The text was updated successfully, but these errors were encountered:

tianleiwu · 2024-07-17T18:17:39Z

@liqunfu

cleardusk · 2024-07-28T04:57:13Z

I completely agree with @juntaosun.
For example, LivePortrait currently cannot support ONNX because 5D grid_sample is not supported on GPU : (

@tianleiwu @liqunfu

juntaosun · 2024-09-17T01:14:08Z

I completely agree with @cleardusk
Are there any plans to improve the performance and speed of grid_sample in onnxruntime-gpu ?
@tianleiwu @liqunfu

tianleiwu · 2024-09-17T18:34:32Z

@liqunfu, is there plan to add the support in 1.20 release?

If not, I suggest other people who are interested in it can continue from your scratch, and submit a pull request. What do you think?

fedral · 2024-09-20T07:03:22Z

Agreed.
On onnxruntime 1.17.0 +cuda11.8+opset 20, grid_sample 1080p output takes 70 ms with CPU, while GPU is much slow than CPU mode, around 140ms doubled. Compared with torch implementation, inference only takes 0.01ms. really big diffenence.

looking forward onnx team to support andoptimize 4D/5D grid_sample op on GPU，thanks

juntaosun · 2024-09-22T03:02:30Z

I hope you can pay attention to it. More and more models are being used, but grid_sample in onnxruntime is dozens of times slower than torch.

liqunfu · 2024-09-23T22:37:05Z

I added/update gridsample cpu implementation when the op was added/updated in onnx as part of onnx integration with ort. The implementation was inherited from an existing contribute op. I do not see quick way to improve its performance by dozens times. Usually gridsample is preceded with an affinegrid. In this case the ops can be fused. In such case, the implementation can be greatly improved. I wonder if this is the use case?
I expect someone taking over this work because I am on other task now.

juntaosun added the feature request request for unsupported feature or enhancement label Jul 17, 2024

github-actions bot added the ep:CUDA issues related to the CUDA execution provider label Jul 17, 2024

juntaosun mentioned this issue Sep 22, 2024

implement/port onnx gridsample op in cuda #18942

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Request grid_sample 5D support 🌟 #21382

[Feature Request] Request grid_sample 5D support 🌟 #21382

juntaosun commented Jul 17, 2024 •

edited

Loading

tianleiwu commented Jul 17, 2024

cleardusk commented Jul 28, 2024

juntaosun commented Sep 17, 2024

tianleiwu commented Sep 17, 2024 •

edited

Loading

fedral commented Sep 20, 2024

juntaosun commented Sep 22, 2024

liqunfu commented Sep 23, 2024

[Feature Request] Request grid_sample 5D support 🌟 #21382

[Feature Request] Request grid_sample 5D support 🌟 #21382

Comments

juntaosun commented Jul 17, 2024 • edited Loading

Describe the feature request

Describe scenario use case

tianleiwu commented Jul 17, 2024

cleardusk commented Jul 28, 2024

juntaosun commented Sep 17, 2024

tianleiwu commented Sep 17, 2024 • edited Loading

fedral commented Sep 20, 2024

juntaosun commented Sep 22, 2024

liqunfu commented Sep 23, 2024

juntaosun commented Jul 17, 2024 •

edited

Loading

tianleiwu commented Sep 17, 2024 •

edited

Loading