Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Microsoft.ML.OnnxRuntime.DirectML and Microsoft.AI.DirectML C++ API got incorrect masks result(detectron2 Mask r-cnn Model) on both Integrated and Discrete GPUs, but Python onnxruntime-directml is fine #21560

Closed
YOODS-Xu opened this issue Jul 30, 2024 · 3 comments
Labels
ep:DML issues related to the DirectML execution provider .NET Pull requests that update .net code

Comments

@YOODS-Xu
Copy link

YOODS-Xu commented Jul 30, 2024

Describe the issue

Microsoft.ML.OnnxRuntime.DirectML and Microsoft.AI.DirectML C++ API got incorrect masks results(detectron2 Mask r-cnn Model) on both Integrated and Discrete GPUs.
Only masks results are incorrect, bboxes, scores, classes are correct.
On CPU masks results are correct.
I checked almost all versions from 1.12.1 to 1.18.1 of Microsoft.ML.OnnxRuntime.DirectML and from 1.9.0 to 1.15.0 of Microsoft.AI.DirectML, using onnx model's opset version 17 and 16.
I installed the api use NuGet Manager.

I checked the model using python api too.
When onnxruntime-directml' version is under 1.14.1, masks results were correct, and incorrect with version 1.15.0 on integrated GPU.

My System

OS: Windows10
C++: std:c++17
Visual Studio 2019
Model Opset Version: 17 or 16
Microsoft.ML.OnnxRuntime.DirectML Version: from 12.1 to 18.1
Microsoft.AI.DirectML: from 1.9.0 to 1.15.0
Install Method: NuGet
CPU: 11th Gen Intel(R) Core(TM) i7-1165G7 @2.80GHz
Integrated GPU: Intel(R) Iris(R) Xe Graphics
Discrete GPU: NIVIDIA GeForce GTX 1050

Is it any problems I used the API?
Did any other meet this issue?
Any suggestions are much appreciated.

To reproduce

code


options.DisableMemPattern();
options.SetExecutionMode(ORT_SEQUENTIAL);
options.SetGraphOptimizationLevel(GraphOptimizationLevel::ORT_ENABLE_ALL);
OrtSessionOptionsAppendExecutionProvider_DML(options, 0);
session_ = Ort::Session(env_, modelfile.data(), options);
...
std::vector < Ort::Value> output_ortvalues = session_.Run(Ort::RunOptions{},
input_names.data(), &input_tensor, input_names.size(),
output_names.data(), output_names.size());


When I commented out "OrtSessionOptionsAppendExecutionProvider_DML(options, 0);", it means using CPU, masks results were correct. But inference time became 2x.

Urgency

Because of the deadline of the project, it's a bit urgent.

Platform

Windows

OS Version

10 Pro 22H2 19045.4651

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

from 12.1 to 18.1

ONNX Runtime API

C++

Architecture

X64

Execution Provider

DirectML

Execution Provider Library Version

from 1.9.0 to 1.25.0

@github-actions github-actions bot added .NET Pull requests that update .net code ep:DML issues related to the DirectML execution provider labels Jul 30, 2024
@YOODS-Xu YOODS-Xu changed the title Microsoft.ML.OnnxRuntime.DirectML and Microsoft.AI.DirectML C++ API got incorrect masks result(detectron2 Mask r-cnn Model) on both Integrated and Discrete GPUs Microsoft.ML.OnnxRuntime.DirectML and Microsoft.AI.DirectML C++ API got incorrect masks result(detectron2 Mask r-cnn Model) on both Integrated and Discrete GPUs, but Python onnxruntime-directml is fine Jul 30, 2024
@fdwr
Copy link
Contributor

fdwr commented Jul 30, 2024

It may take some time before someone can investigate. Pointing to the exact .onnx file URL (since there are often multiple variations of a given model name) in the ONNX model zoo or Huggingface or elsewhere will help. In the meantime, you might (for additional diagnosis information) try with the lowest GraphOptimizationLevel in case it's an optimization bug; and if you are able to build ORT locally, you could try unregistering (comment out //) relevant ops here (like maybe RoiAlign) to see if one is the culprit.

@YOODS-Xu
Copy link
Author

YOODS-Xu commented Jul 31, 2024 via email

@YOODS-Xu
Copy link
Author

YOODS-Xu commented Jul 31, 2024

Microsoft.ML.OnnxRuntime.DirectML Version under 1.15.0, such as 1.14.1 gave correct masks result same to Python API.
It seems like there was some thing wrong with Nuget and Visual Studio 2019 build, so that correct version onnxruntime.dll could not be copied to Release Folder.

I am so sorry to bother you, and thank you for your help all the time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ep:DML issues related to the DirectML execution provider .NET Pull requests that update .net code
Projects
None yet
Development

No branches or pull requests

2 participants