Microsoft.ML.OnnxRuntime.DirectML and Microsoft.AI.DirectML C++ API got incorrect masks result(detectron2 Mask r-cnn Model) on both Integrated and Discrete GPUs, but Python onnxruntime-directml is fine #21560

YOODS-Xu · 2024-07-30T07:09:03Z

Describe the issue

Microsoft.ML.OnnxRuntime.DirectML and Microsoft.AI.DirectML C++ API got incorrect masks results(detectron2 Mask r-cnn Model) on both Integrated and Discrete GPUs.
Only masks results are incorrect, bboxes, scores, classes are correct.
On CPU masks results are correct.
I checked almost all versions from 1.12.1 to 1.18.1 of Microsoft.ML.OnnxRuntime.DirectML and from 1.9.0 to 1.15.0 of Microsoft.AI.DirectML, using onnx model's opset version 17 and 16.
I installed the api use NuGet Manager.

I checked the model using python api too.
When onnxruntime-directml' version is under 1.14.1, masks results were correct, and incorrect with version 1.15.0 on integrated GPU.

My System

OS: Windows10
C++: std:c++17
Visual Studio 2019
Model Opset Version: 17 or 16
Microsoft.ML.OnnxRuntime.DirectML Version: from 12.1 to 18.1
Microsoft.AI.DirectML: from 1.9.0 to 1.15.0
Install Method: NuGet
CPU: 11th Gen Intel(R) Core(TM) i7-1165G7 @2.80GHz
Integrated GPU: Intel(R) Iris(R) Xe Graphics
Discrete GPU: NIVIDIA GeForce GTX 1050

Is it any problems I used the API?
Did any other meet this issue?
Any suggestions are much appreciated.

To reproduce

code

options.DisableMemPattern();
options.SetExecutionMode(ORT_SEQUENTIAL);
options.SetGraphOptimizationLevel(GraphOptimizationLevel::ORT_ENABLE_ALL);
OrtSessionOptionsAppendExecutionProvider_DML(options, 0);
session_ = Ort::Session(env_, modelfile.data(), options);
...
std::vector < Ort::Value> output_ortvalues = session_.Run(Ort::RunOptions{},
input_names.data(), &input_tensor, input_names.size(),
output_names.data(), output_names.size());

When I commented out "OrtSessionOptionsAppendExecutionProvider_DML(options, 0);", it means using CPU, masks results were correct. But inference time became 2x.

Urgency

Because of the deadline of the project, it's a bit urgent.

Platform

Windows

OS Version

10 Pro 22H2 19045.4651

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

from 12.1 to 18.1

ONNX Runtime API

C++

Architecture

X64

Execution Provider

DirectML

Execution Provider Library Version

from 1.9.0 to 1.25.0

fdwr · 2024-07-30T18:40:57Z

It may take some time before someone can investigate. Pointing to the exact .onnx file URL (since there are often multiple variations of a given model name) in the ONNX model zoo or Huggingface or elsewhere will help. In the meantime, you might (for additional diagnosis information) try with the lowest GraphOptimizationLevel in case it's an optimization bug; and if you are able to build ORT locally, you could try unregistering (comment out //) relevant ops here (like maybe RoiAlign) to see if one is the culprit.

YOODS-Xu · 2024-07-31T01:14:30Z

Thank you so much for your prompt reply. I checked below GraphOptimizationLevel, unfortunately the masks results were still incorrect. ORT_DISABLE_ALL ORT_ENABLE_BASIC ORT_ENABLE_EXTENDED Is there any thing wrong in my code ? I didn't use batch, model's input is only one image matric(3, 800, 1000). ****** options.DisableMemPattern(); options.SetExecutionMode(ORT_SEQUENTIAL); options.SetGraphOptimizationLevel(GraphOptimizationLevel::ORT_ENABLE_ALL); OrtSessionOptionsAppendExecutionProvider_DML(options, 0); session_ = Ort::Session(env_, modelfile.data(), options); ... std::vector < Ort::Value> output_ortvalues = session_.Run(Ort::RunOptions{}, input_names.data(), &input_tensor, input_names.size(), output_names.data(), output_names.size()); ****** I attached the model ops17 by mail and deleted the link from Github page If ops16 is needed too, please let me know. Python API onnxruntime-directml under 1.15.0, such as 1.14.1 gave correct masks results even use integrated GPU. Thank you for your help all the time.

…

________________________________ 差出人: Dwayne Robinson ***@***.***> 送信日�r: 2024年7月31日 3:41 宛先: microsoft/onnxruntime ***@***.***> CC: S ��梅 ***@***.***>; Author ***@***.***> 件名: Re: [microsoft/onnxruntime] Microsoft.ML.OnnxRuntime.DirectML and Microsoft.AI.DirectML C++ API got incorrect masks result(detectron2 Mask r-cnn Model) on both Integrated and Discrete GPUs, but Python onnxruntime-directml is fine (Issue #21560) It may take some time before someone can investigate. Pointing to the exact .onnx file URL (since there are often multiple variations of a given model name) in the ONNX model zoo or Huggingface or elsewhere will help. You could in the meantime also (just for additional information) try with the lowest GraphOptimizationLevel in case it's an optimization bug. If you are able to build ORT locally, you could try unregistering relevant ops here (like maybe RoiAlign<https://github.com/microsoft/onnxruntime/blob/1637f22d39b6d57d2774d7d41e6a8ae1815180c5/onnxruntime/core/providers/dml/DmlExecutionProvider/src/Operators/OperatorRegistration.cpp#L758-L759>) to see which one enables it to work. ― Reply to this email directly, view it on GitHub<#21560 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AQXA7M7HDD52Z6EUFIC3JQ3ZO7M47AVCNFSM6AAAAABLVXAGA2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENJYHE3TQMZVGE>. You are receiving this because you authored the thread.Message ID: ***@***.***>

YOODS-Xu · 2024-07-31T04:51:22Z

Microsoft.ML.OnnxRuntime.DirectML Version under 1.15.0, such as 1.14.1 gave correct masks result same to Python API.
It seems like there was some thing wrong with Nuget and Visual Studio 2019 build, so that correct version onnxruntime.dll could not be copied to Release Folder.

I am so sorry to bother you, and thank you for your help all the time.

github-actions bot added .NET Pull requests that update .net code ep:DML issues related to the DirectML execution provider labels Jul 30, 2024

YOODS-Xu closed this as completed Jul 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Microsoft.ML.OnnxRuntime.DirectML and Microsoft.AI.DirectML C++ API got incorrect masks result(detectron2 Mask r-cnn Model) on both Integrated and Discrete GPUs, but Python onnxruntime-directml is fine #21560

Microsoft.ML.OnnxRuntime.DirectML and Microsoft.AI.DirectML C++ API got incorrect masks result(detectron2 Mask r-cnn Model) on both Integrated and Discrete GPUs, but Python onnxruntime-directml is fine #21560

YOODS-Xu commented Jul 30, 2024 •

edited

Loading

fdwr commented Jul 30, 2024 •

edited

Loading

YOODS-Xu commented Jul 31, 2024 via email •

edited

Loading

YOODS-Xu commented Jul 31, 2024 •

edited

Loading

Microsoft.ML.OnnxRuntime.DirectML and Microsoft.AI.DirectML C++ API got incorrect masks result(detectron2 Mask r-cnn Model) on both Integrated and Discrete GPUs, but Python onnxruntime-directml is fine #21560

Microsoft.ML.OnnxRuntime.DirectML and Microsoft.AI.DirectML C++ API got incorrect masks result(detectron2 Mask r-cnn Model) on both Integrated and Discrete GPUs, but Python onnxruntime-directml is fine #21560

Comments

YOODS-Xu commented Jul 30, 2024 • edited Loading

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

fdwr commented Jul 30, 2024 • edited Loading

YOODS-Xu commented Jul 31, 2024 via email • edited Loading

YOODS-Xu commented Jul 31, 2024 • edited Loading

YOODS-Xu commented Jul 30, 2024 •

edited

Loading

fdwr commented Jul 30, 2024 •

edited

Loading

YOODS-Xu commented Jul 31, 2024 via email •

edited

Loading

YOODS-Xu commented Jul 31, 2024 •

edited

Loading