Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WebGPU] Error: [WebGPU] Kernel "[Mul] /head/istft/Mul_1" failed. Error: Failed to generate kernel's output[0] with dims [1,3520,3520]. If you are running with pre-allocated output, please make sure the output type/dims are correct. Error: 81415528. #22994

Open
xenova opened this issue Dec 3, 2024 · 6 comments · Fixed by #22997
Labels
ep:WebGPU ort-web webgpu provider platform:web issues related to ONNX Runtime web; typically submitted using template

Comments

@xenova
Copy link

xenova commented Dec 3, 2024

Describe the issue

Unable to run https://huggingface.co/onnx-community/WavTokenizer-large-speech-75token_decode on WebGPU

[E:onnxruntime:, sequential_executor.cc:516 ExecuteKernel] Non-zero status code returned while running Mul node. Name:'/head/istft/Mul_1' Status Message: Failed to run JSEP kernel
failed to inference ONNX model: Error: [WebGPU] Kernel "[Mul] /head/istft/Mul_1" failed. Error: Failed to generate kernel's output[0] with dims [1,3520,3520]. If you are running with pre-allocated output, please make sure the output type/dims are correct. Error: 81415528.

Image

To reproduce

https://jsfiddle.net/Lq725aou/3/

Urgency

Blocks WebGPU for this demo: huggingface/transformers.js-examples#17

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.20.1

Execution Provider

'webgpu' (WebGPU)

@xenova xenova added the platform:web issues related to ONNX Runtime web; typically submitted using template label Dec 3, 2024
@github-actions github-actions bot added .NET Pull requests that update .net code ep:WebGPU ort-web webgpu provider labels Dec 3, 2024
@guschmue guschmue removed the .NET Pull requests that update .net code label Dec 3, 2024
@fs-eire
Copy link
Contributor

fs-eire commented Dec 3, 2024

#22997 is submitted for fixing the shader bug in Transpose. However, it's quite suspicious that the input of a Transpose node is a 1D tensor. Not sure if it's because of any error happened earlier.

@xenova
Copy link
Author

xenova commented Dec 7, 2024

Unfortunately the error still persists: https://jsfiddle.net/gf7b3ck6/4/

@fs-eire fs-eire reopened this Dec 8, 2024
@fs-eire
Copy link
Contributor

fs-eire commented Dec 8, 2024

Reopen the issue. need further investigation.

ankitm3k pushed a commit to intel/onnxruntime that referenced this issue Dec 11, 2024
### Description

Fix a bug in transpose shader, when input/output rank is 1.

### Motivation and Context

Fixes microsoft#22994
ankitm3k pushed a commit to intel/onnxruntime that referenced this issue Dec 11, 2024
### Description

Fix a bug in transpose shader, when input/output rank is 1.

### Motivation and Context

Fixes microsoft#22994
ankitm3k pushed a commit to intel/onnxruntime that referenced this issue Dec 11, 2024
### Description

Fix a bug in transpose shader, when input/output rank is 1.

### Motivation and Context

Fixes microsoft#22994
@xhcao
Copy link
Contributor

xhcao commented Dec 20, 2024

JS EP may handle NHWC incorrectly in this case. If the demo sets preferredLayout: NCHW, there is no error.
I need more time to investigate the root-cause.

@xhcao
Copy link
Contributor

xhcao commented Dec 26, 2024

The issue also exists on CUDA EP when compiling with option --cmake_extra_defines onnxruntime_USE_CUDA_NHWC_OPS=ON, and it throws a different error message as below,
Non-zero status code returned while running Transpose node. Name:'Transpose_token_154' Status Message: perm size: 3 does not match input rank: 1
But the root cause of JS EP and CUDA EP is the same, I think that the /head/istft/Squeeze_1 node in the model is not very correct.

From the spec https://onnx.ai/onnx/operators/onnx__Squeeze.html, the input shape of /head/istft/Squeeze_1 node is 3-D ([1,1,ConvTranspose_423_o0__d2]), the output shape should be 1-D ([ConvTranspose_423_o0__d2]). But its shape is still 3-D ([ConvTranspose_423_o0__d0,ConvTranspose_423_o0__d1,ConvTranspose_423_o0__d2]), which leads to error when making layout transform (NCHW -> NHWC) and Transpose optimization.
Image

Let us take the input's shape(int64[batch_size,sequence_length]) is [1, 8] as an example. When using NCHW ops and calling session.initialize() to parse the model, the output shapes of nodes is shown as below, the shapes which parsing from model are all 3-D. -1 means unfixed value.
Image

When calling session.run() to run the model, the output shapes of nodes is shown as below, some shapes are all 1-D for the /head/istft/Squeeze_1 node.
Image
But the dims' values are all one except the innermost dim, so the model could run correctly.

When using NHWC ops and session.initialize() to parse the model, after making layout transforms, we will traverse all nodes to apply Transpose optimization, the topology is shown as below before traversing Div node, and we could see that a Transpose node is pushed after ConvTranspose node,
Image

After traversing Div node, and we could see that the Transpose node passes through the Div, and a new Transpose node is added after Where node, and perm of the new Transpose node is [0, 2, 1].
Image

When using NHWC ops and session.run() to run the model, the output shapes of nodes is shown as below, some shapes are all 1-D for the /head/istft/Squeeze_1 node.
Image
Cuda EP reports the useful error message when running the new Transpose node.
JS EP does not validate the input size and perm size of Transpose, and does nothing, so throw an error message until running Mul node for an error input shape [1, 3520, 3520], its expected shape is [1, 3520, 1]

In summary, I think that we must firstly modify the outputs dims' value of model nodes after the /head/istft/Squeeze_1 node. And onnxruntime core ensure that the Transpose node cannot pass through Div node and add a new Transpose node after Where node.

@xhcao
Copy link
Contributor

xhcao commented Dec 26, 2024

@jchen10 @hujiajie

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ep:WebGPU ort-web webgpu provider platform:web issues related to ONNX Runtime web; typically submitted using template
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants