[E:onnxruntime:, sequential_executor.cc:516 onnxruntime::ExecuteKernel] Non-zero status code returned while running LayerNormalization node. #21012

Jose17-ml · 2024-06-12T09:32:58Z

Describe the feature request

Hi Experts,

I just started working AI/ML stuff recently. Currently trying to run Hugging Face - Optimum model on GPU using DML-EP

Platform: Windows 11

Model: https://huggingface.co/optimum/m2m100_418M

Changes:

import onnxruntime

session_opt = onnxruntime.SessionOptions()
session_opt.log_severity_level = 0
#provider = "CPUExecutionProvider"
provider = "DmlExecutionProvider"
NUM_ITERATIONS = 1

model_name = "optimum/m2m100_418M"

hi_text = "जीवन एक चॉकलेट बॉक्स की तरह है।"
chinese_text = "生活就像一盒巧克力。"

model = ORTModelForSeq2SeqLM.from_pretrained(model_name, provider=provider, session_options=session_opt)

When I use "DmlExecutionProvider", I see below error

2024-06-12 14:35:21.2694023 [E:onnxruntime:, sequential_executor.cc:516 onnxruntime::ExecuteKernel] Non-zero status code returned while running LayerNormalization node. Name:'/model/decoder/layer_norm/Mul/LayerNormFusion/' Status Message: C:\a_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\MLOperatorAuthorImpl.cpp(2468)\onnxruntime_pybind11_state.pyd!00007FFA9B5A09BF: (caller: 00007FFA9B5A2174) Exception(3) tid(1ff4) 887A0005 The GPU device instance has been suspended. Use GetDeviceRemovedReason to determine the appropriate action.

But where as with "CPUExecutionProvider", I don't see any issue and able to run the model successfully.

So, I need your help to resolve this issue and run with DML-EP.

Thanks

Describe scenario use case

Trying to huggingface-Optimum model with DML-EP

Jose17-ml · 2024-06-13T07:29:36Z

Hi Experts,

Need your inputs.

Jose17-ml · 2024-06-18T07:26:40Z

Hi,

Any inputs?

zhangxiang1993 · 2024-07-29T09:41:44Z

Hi Josh, can you provide the information of you GPU device by sending a screenshot of from task manager performace tab.
This model is not in our original supported model list. It's much likely the layernorm op used in this op is slightly different from DML-EP supports. It would be helpful if you could provide us more details of the layernorm op usage in this model.

zhangxiang1993 · 2024-08-21T23:01:07Z

Closing issue due to low activity. Feel free to reopen it with more information about the op.

RockNHawk · 2024-12-16T01:59:56Z

Excuse me, did you resolved this issue？

Jose17-ml added the feature request request for unsupported feature or enhancement label Jun 12, 2024

github-actions bot added ep:DML issues related to the DirectML execution provider model:transformer issues related to a transformer model: BERT, GPT2, Hugging Face, Longformer, T5, etc. platform:windows issues related to the Windows platform labels Jun 12, 2024

zhangxiang1993 closed this as completed Aug 21, 2024

zhangxiang1993 reopened this Aug 21, 2024

zhangxiang1993 closed this as not planned Won't fix, can't repro, duplicate, stale Aug 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[E:onnxruntime:, sequential_executor.cc:516 onnxruntime::ExecuteKernel] Non-zero status code returned while running LayerNormalization node. #21012

[E:onnxruntime:, sequential_executor.cc:516 onnxruntime::ExecuteKernel] Non-zero status code returned while running LayerNormalization node. #21012

Jose17-ml commented Jun 12, 2024

Jose17-ml commented Jun 13, 2024

Jose17-ml commented Jun 18, 2024

zhangxiang1993 commented Jul 29, 2024

zhangxiang1993 commented Aug 21, 2024

RockNHawk commented Dec 16, 2024

[E:onnxruntime:, sequential_executor.cc:516 onnxruntime::ExecuteKernel] Non-zero status code returned while running LayerNormalization node. #21012

[E:onnxruntime:, sequential_executor.cc:516 onnxruntime::ExecuteKernel] Non-zero status code returned while running LayerNormalization node. #21012

Comments

Jose17-ml commented Jun 12, 2024

Describe the feature request

Describe scenario use case

Jose17-ml commented Jun 13, 2024

Jose17-ml commented Jun 18, 2024

zhangxiang1993 commented Jul 29, 2024

zhangxiang1993 commented Aug 21, 2024

RockNHawk commented Dec 16, 2024