[BUG] [OpenVino EP] Only first result in session is correct. #19975

debugmenot · 2024-03-19T05:25:33Z

Describe the issue

When running inference session ONLY with OpenVino EP and ORT > 1.13.1 any results except first are incorrect. There are no issues with ORT == 1.13.1 or CPU/CUDA/XNNPACK on any ORT version.

Getting this issue only on one model (Attention OCR) - model structure you can find at the bottom, other models works fine. seems there are some layers/functions in it that was broken after 1.13.1 build...

Description:

Ubuntu 22.04, Onnxruntime 1.17.1, OpenVino 2023.3, C++
Model: sort of Attention Decoder OCR, converted to onnx from pytorch.

Issue:
im inferencing the same image (also tried on sequence of different images durning the inference session). Only the FIRST result is correct. Second result and so on looks like partially "cropped" first result doesnt matter if next input data is new...
For example inferencing sequence of images with text "1234567890", "ABCDEFGHJK", "7777777777". Getting: "1234567890", "1200120012", "1200120012"...

Downgrade to ORT 1.13.1 solved the issue, but seems that something is broken after 1.13.1 build.
All other EP (CPU, CUDA, XNNPACK) works well with the same code.

Found one reference to similar issue in OpenVino github: openvinotoolkit/openvino#12966

Enabled verbose mode and found that node placements are differ between 1.17.1 (incorrect) and 1.13.1(correct) inference sessions, maybe it's matters, but doesn't explain why first result is always correct...:

correct inference session with node placements(1.13.1):

* Node placements
*Node(s) placed on [OpenVINOExecutionProvider]. Number of nodes: 11

OpenVINO-EP-subgraph_1 (OpenVINOExecutionProvider_OpenVINO-EP-subgraph_1_0)
OpenVINO-EP-subgraph_2 (OpenVINOExecutionProvider_OpenVINO-EP-subgraph_2_1)
OpenVINO-EP-subgraph_3 (OpenVINOExecutionProvider_OpenVINO-EP-subgraph_3_2)
OpenVINO-EP-subgraph_4 (OpenVINOExecutionProvider_OpenVINO-EP-subgraph_4_3)
OpenVINO-EP-subgraph_5 (OpenVINOExecutionProvider_OpenVINO-EP-subgraph_5_4)
OpenVINO-EP-subgraph_6 (OpenVINOExecutionProvider_OpenVINO-EP-subgraph_6_5)
OpenVINO-EP-subgraph_7 (OpenVINOExecutionProvider_OpenVINO-EP-subgraph_7_6)
OpenVINO-EP-subgraph_8 (OpenVINOExecutionProvider_OpenVINO-EP-subgraph_8_7)
OpenVINO-EP-subgraph_9 (OpenVINOExecutionProvider_OpenVINO-EP-subgraph_9_8)
OpenVINO-EP-subgraph_10 (OpenVINOExecutionProvider_OpenVINO-EP-subgraph_10_9)
OpenVINO-EP-subgraph_11 (OpenVINOExecutionProvider_OpenVINO-EP-subgraph_11_10)
*Node(s) placed on [CPUExecutionProvider]. Number of nodes: 167
GRU (/decoder/rnn/GRU)
LogSoftmax (/decoder/LogSoftmax)
ArgMax (/decoder/ArgMax)
Unsqueeze (/decoder/Unsqueeze)
Transpose (/decoder/Transpose_2)
Gather (/decoder/emb_1/Gather)
Expand (/decoder/attention_1/Expand)
Transpose (/decoder/attention_1/Transpose)
Concat (/decoder/attention_1/Concat)
MatMul (/decoder/attention/attn_1/MatMul)
Add (/decoder/attention/attn_1/Add)
Tanh (/decoder/attention_1/Tanh)
Softmax (/decoder/attention_1/Softmax)
MatMul (/decoder/MatMul_1)
Transpose (/decoder/Transpose_3)
Concat (/decoder/Concat_1)
GRU (/decoder/rnn_1/GRU)
LogSoftmax (/decoder/LogSoftmax_1)
ArgMax (/decoder/ArgMax_1)
Unsqueeze (/decoder/Unsqueeze_1)
Transpose (/decoder/Transpose_4)
Gather (/decoder/emb_2/Gather)
Expand (/decoder/attention_2/Expand)
Transpose (/decoder/attention_2/Transpose)
Concat (/decoder/attention_2/Concat)
MatMul (/decoder/attention/attn_2/MatMul)
Add (/decoder/attention/attn_2/Add)
Tanh (/decoder/attention_2/Tanh)
Softmax (/decoder/attention_2/Softmax)
MatMul (/decoder/MatMul_2)
Transpose (/decoder/Transpose_5)
Concat (/decoder/Concat_2)
GRU (/decoder/rnn_2/GRU)
LogSoftmax (/decoder/LogSoftmax_2)
ArgMax (/decoder/ArgMax_2)
Unsqueeze (/decoder/Unsqueeze_2)
Transpose (/decoder/Transpose_6)
Gather (/decoder/emb_3/Gather)
Expand (/decoder/attention_3/Expand)
Transpose (/decoder/attention_3/Transpose)
Concat (/decoder/attention_3/Concat)
MatMul (/decoder/attention/attn_3/MatMul)
Add (/decoder/attention/attn_3/Add)
Tanh (/decoder/attention_3/Tanh)
Softmax (/decoder/attention_3/Softmax)
MatMul (/decoder/MatMul_3)
Transpose (/decoder/Transpose_7)
Concat (/decoder/Concat_3)
GRU (/decoder/rnn_3/GRU)
LogSoftmax (/decoder/LogSoftmax_3)
ArgMax (/decoder/ArgMax_3)
Unsqueeze (/decoder/Unsqueeze_3)
Transpose (/decoder/Transpose_8)
Gather (/decoder/emb_4/Gather)
Expand (/decoder/attention_4/Expand)
Transpose (/decoder/attention_4/Transpose)
Concat (/decoder/attention_4/Concat)
MatMul (/decoder/attention/attn_4/MatMul)
Add (/decoder/attention/attn_4/Add)
Tanh (/decoder/attention_4/Tanh)
Softmax (/decoder/attention_4/Softmax)
MatMul (/decoder/MatMul_4)
Transpose (/decoder/Transpose_9)
Concat (/decoder/Concat_4)
GRU (/decoder/rnn_4/GRU)
LogSoftmax (/decoder/LogSoftmax_4)
ArgMax (/decoder/ArgMax_4)
Unsqueeze (/decoder/Unsqueeze_4)
Transpose (/decoder/Transpose_10)
Gather (/decoder/emb_5/Gather)
Expand (/decoder/attention_5/Expand)
Transpose (/decoder/attention_5/Transpose)
Concat (/decoder/attention_5/Concat)
MatMul (/decoder/attention/attn_5/MatMul)
Add (/decoder/attention/attn_5/Add)
Tanh (/decoder/attention_5/Tanh)
Softmax (/decoder/attention_5/Softmax)
MatMul (/decoder/MatMul_5)
Transpose (/decoder/Transpose_11)
Concat (/decoder/Concat_5)
GRU (/decoder/rnn_5/GRU)
LogSoftmax (/decoder/LogSoftmax_5)
ArgMax (/decoder/ArgMax_5)
Unsqueeze (/decoder/Unsqueeze_5)
Transpose (/decoder/Transpose_12)
Gather (/decoder/emb_6/Gather)
Expand (/decoder/attention_6/Expand)
Transpose (/decoder/attention_6/Transpose)
Concat (/decoder/attention_6/Concat)
MatMul (/decoder/attention/attn_6/MatMul)
Add (/decoder/attention/attn_6/Add)
Tanh (/decoder/attention_6/Tanh)
Softmax (/decoder/attention_6/Softmax)
MatMul (/decoder/MatMul_6)
Transpose (/decoder/Transpose_13)
Concat (/decoder/Concat_6)
GRU (/decoder/rnn_6/GRU)
LogSoftmax (/decoder/LogSoftmax_6)
ArgMax (/decoder/ArgMax_6)
Unsqueeze (/decoder/Unsqueeze_6)
Transpose (/decoder/Transpose_14)
Gather (/decoder/emb_7/Gather)
Expand (/decoder/attention_7/Expand)
Transpose (/decoder/attention_7/Transpose)
Concat (/decoder/attention_7/Concat)
MatMul (/decoder/attention/attn_7/MatMul)
Add (/decoder/attention/attn_7/Add)
Tanh (/decoder/attention_7/Tanh)
Softmax (/decoder/attention_7/Softmax)
MatMul (/decoder/MatMul_7)
Transpose (/decoder/Transpose_15)
Concat (/decoder/Concat_7)
GRU (/decoder/rnn_7/GRU)
LogSoftmax (/decoder/LogSoftmax_7)
ArgMax (/decoder/ArgMax_7)
Unsqueeze (/decoder/Unsqueeze_7)
Transpose (/decoder/Transpose_16)
Gather (/decoder/emb_8/Gather)
Expand (/decoder/attention_8/Expand)
Transpose (/decoder/attention_8/Transpose)
Concat (/decoder/attention_8/Concat)
MatMul (/decoder/attention/attn_8/MatMul)
Add (/decoder/attention/attn_8/Add)
Tanh (/decoder/attention_8/Tanh)
Softmax (/decoder/attention_8/Softmax)
MatMul (/decoder/MatMul_8)
Transpose (/decoder/Transpose_17)
Concat (/decoder/Concat_8)
GRU (/decoder/rnn_8/GRU)
LogSoftmax (/decoder/LogSoftmax_8)
ArgMax (/decoder/ArgMax_8)
Unsqueeze (/decoder/Unsqueeze_8)
Transpose (/decoder/Transpose_18)
Gather (/decoder/emb_9/Gather)
Expand (/decoder/attention_9/Expand)
Transpose (/decoder/attention_9/Transpose)
Concat (/decoder/attention_9/Concat)
MatMul (/decoder/attention/attn_9/MatMul)
Add (/decoder/attention/attn_9/Add)
Tanh (/decoder/attention_9/Tanh)
Softmax (/decoder/attention_9/Softmax)
MatMul (/decoder/MatMul_9)
Transpose (/decoder/Transpose_19)
Concat (/decoder/Concat_9)
GRU (/decoder/rnn_9/GRU)
LogSoftmax (/decoder/LogSoftmax_9)
Unsqueeze (/decoder/Unsqueeze_9)
Unsqueeze (/decoder/Unsqueeze_10)
Unsqueeze (/decoder/Unsqueeze_11)
Unsqueeze (/decoder/Unsqueeze_12)
Unsqueeze (/decoder/Unsqueeze_13)
Unsqueeze (/decoder/Unsqueeze_14)
Unsqueeze (/decoder/Unsqueeze_15)
Unsqueeze (/decoder/Unsqueeze_16)
Unsqueeze (/decoder/Unsqueeze_17)
Unsqueeze (/decoder/Unsqueeze_18)
Concat (/decoder/Concat_10)
Transpose (/decoder/Transpose_20)
FusedMatMul (MatMul_With_Transpose)
FusedMatMul (MatMul_With_Transpose_token_0)
FusedMatMul (MatMul_With_Transpose_token_1)
FusedMatMul (MatMul_With_Transpose_token_2)
FusedMatMul (MatMul_With_Transpose_token_3)
FusedMatMul (MatMul_With_Transpose_token_4)
FusedMatMul (MatMul_With_Transpose_token_5)
FusedMatMul (MatMul_With_Transpose_token_6)
FusedMatMul (MatMul_With_Transpose_token_7)

Incorrect inference result node placement (1.17.1)

* Node placements
*Node(s) placed on [OpenVINOExecutionProvider]. Number of nodes: 11

OpenVINO-EP-subgraph_1 (OpenVINOExecutionProvider_OpenVINO-EP-subgraph_1_0)
OpenVINO-EP-subgraph_2 (OpenVINOExecutionProvider_OpenVINO-EP-subgraph_2_1)
OpenVINO-EP-subgraph_3 (OpenVINOExecutionProvider_OpenVINO-EP-subgraph_3_2)
OpenVINO-EP-subgraph_4 (OpenVINOExecutionProvider_OpenVINO-EP-subgraph_4_3)
OpenVINO-EP-subgraph_5 (OpenVINOExecutionProvider_OpenVINO-EP-subgraph_5_4)
OpenVINO-EP-subgraph_6 (OpenVINOExecutionProvider_OpenVINO-EP-subgraph_6_5)
OpenVINO-EP-subgraph_7 (OpenVINOExecutionProvider_OpenVINO-EP-subgraph_7_6)
OpenVINO-EP-subgraph_8 (OpenVINOExecutionProvider_OpenVINO-EP-subgraph_8_7)
OpenVINO-EP-subgraph_9 (OpenVINOExecutionProvider_OpenVINO-EP-subgraph_9_8)
OpenVINO-EP-subgraph_10 (OpenVINOExecutionProvider_OpenVINO-EP-subgraph_10_9)
OpenVINO-EP-subgraph_11 (OpenVINOExecutionProvider_OpenVINO-EP-subgraph_11_10)
*Node(s) placed on [CPUExecutionProvider]. Number of nodes: 167
GRU (/decoder/rnn/GRU)
LogSoftmax (/decoder/LogSoftmax)
ArgMax (/decoder/ArgMax)
Unsqueeze (/decoder/Unsqueeze)
Transpose (/decoder/Transpose_2)
Gather (/decoder/emb_1/Gather)
Expand (/decoder/attention_1/Expand)
Transpose (/decoder/attention_1/Transpose)
Concat (/decoder/attention_1/Concat)
MatMul (/decoder/attention/attn_1/MatMul)
Add (/decoder/attention/attn_1/Add)
Tanh (/decoder/attention_1/Tanh)
Softmax (/decoder/attention_1/Softmax)
MatMul (/decoder/MatMul_1)
Transpose (/decoder/Transpose_3)
Concat (/decoder/Concat_1)
GRU (/decoder/rnn_1/GRU)
LogSoftmax (/decoder/LogSoftmax_1)
ArgMax (/decoder/ArgMax_1)
Unsqueeze (/decoder/Unsqueeze_1)
Transpose (/decoder/Transpose_4)
Gather (/decoder/emb_2/Gather)
Expand (/decoder/attention_2/Expand)
Transpose (/decoder/attention_2/Transpose)
Concat (/decoder/attention_2/Concat)
MatMul (/decoder/attention/attn_2/MatMul)
Add (/decoder/attention/attn_2/Add)
Tanh (/decoder/attention_2/Tanh)
Softmax (/decoder/attention_2/Softmax)
MatMul (/decoder/MatMul_2)
Transpose (/decoder/Transpose_5)
Concat (/decoder/Concat_2)
GRU (/decoder/rnn_2/GRU)
LogSoftmax (/decoder/LogSoftmax_2)
ArgMax (/decoder/ArgMax_2)
Unsqueeze (/decoder/Unsqueeze_2)
Transpose (/decoder/Transpose_6)
Gather (/decoder/emb_3/Gather)
Expand (/decoder/attention_3/Expand)
Transpose (/decoder/attention_3/Transpose)
Concat (/decoder/attention_3/Concat)
MatMul (/decoder/attention/attn_3/MatMul)
Add (/decoder/attention/attn_3/Add)
Tanh (/decoder/attention_3/Tanh)
Softmax (/decoder/attention_3/Softmax)
MatMul (/decoder/MatMul_3)
Transpose (/decoder/Transpose_7)
Concat (/decoder/Concat_3)
GRU (/decoder/rnn_3/GRU)
LogSoftmax (/decoder/LogSoftmax_3)
ArgMax (/decoder/ArgMax_3)
Unsqueeze (/decoder/Unsqueeze_3)
Transpose (/decoder/Transpose_8)
Gather (/decoder/emb_4/Gather)
Expand (/decoder/attention_4/Expand)
Transpose (/decoder/attention_4/Transpose)
Concat (/decoder/attention_4/Concat)
MatMul (/decoder/attention/attn_4/MatMul)
Add (/decoder/attention/attn_4/Add)
Tanh (/decoder/attention_4/Tanh)
Softmax (/decoder/attention_4/Softmax)
MatMul (/decoder/MatMul_4)
Transpose (/decoder/Transpose_9)
Concat (/decoder/Concat_4)
GRU (/decoder/rnn_4/GRU)
LogSoftmax (/decoder/LogSoftmax_4)
ArgMax (/decoder/ArgMax_4)
Unsqueeze (/decoder/Unsqueeze_4)
Transpose (/decoder/Transpose_10)
Gather (/decoder/emb_5/Gather)
Expand (/decoder/attention_5/Expand)
Transpose (/decoder/attention_5/Transpose)
Concat (/decoder/attention_5/Concat)
MatMul (/decoder/attention/attn_5/MatMul)
Add (/decoder/attention/attn_5/Add)
Tanh (/decoder/attention_5/Tanh)
Softmax (/decoder/attention_5/Softmax)
MatMul (/decoder/MatMul_5)
Transpose (/decoder/Transpose_11)
Concat (/decoder/Concat_5)
GRU (/decoder/rnn_5/GRU)
LogSoftmax (/decoder/LogSoftmax_5)
ArgMax (/decoder/ArgMax_5)
Unsqueeze (/decoder/Unsqueeze_5)
Transpose (/decoder/Transpose_12)
Gather (/decoder/emb_6/Gather)
Expand (/decoder/attention_6/Expand)
Transpose (/decoder/attention_6/Transpose)
Concat (/decoder/attention_6/Concat)
MatMul (/decoder/attention/attn_6/MatMul)
Add (/decoder/attention/attn_6/Add)
Tanh (/decoder/attention_6/Tanh)
Softmax (/decoder/attention_6/Softmax)
MatMul (/decoder/MatMul_6)
Transpose (/decoder/Transpose_13)
Concat (/decoder/Concat_6)
GRU (/decoder/rnn_6/GRU)
LogSoftmax (/decoder/LogSoftmax_6)
ArgMax (/decoder/ArgMax_6)
Unsqueeze (/decoder/Unsqueeze_6)
Transpose (/decoder/Transpose_14)
Gather (/decoder/emb_7/Gather)
Expand (/decoder/attention_7/Expand)
Transpose (/decoder/attention_7/Transpose)
Concat (/decoder/attention_7/Concat)
MatMul (/decoder/attention/attn_7/MatMul)
Add (/decoder/attention/attn_7/Add)
Tanh (/decoder/attention_7/Tanh)
Softmax (/decoder/attention_7/Softmax)
MatMul (/decoder/MatMul_7)
Transpose (/decoder/Transpose_15)
Concat (/decoder/Concat_7)
GRU (/decoder/rnn_7/GRU)
LogSoftmax (/decoder/LogSoftmax_7)
ArgMax (/decoder/ArgMax_7)
Unsqueeze (/decoder/Unsqueeze_7)
Transpose (/decoder/Transpose_16)
Gather (/decoder/emb_8/Gather)
Expand (/decoder/attention_8/Expand)
Transpose (/decoder/attention_8/Transpose)
Concat (/decoder/attention_8/Concat)
MatMul (/decoder/attention/attn_8/MatMul)
Add (/decoder/attention/attn_8/Add)
Tanh (/decoder/attention_8/Tanh)
Softmax (/decoder/attention_8/Softmax)
MatMul (/decoder/MatMul_8)
Transpose (/decoder/Transpose_17)
Concat (/decoder/Concat_8)
GRU (/decoder/rnn_8/GRU)
LogSoftmax (/decoder/LogSoftmax_8)
ArgMax (/decoder/ArgMax_8)
Unsqueeze (/decoder/Unsqueeze_8)
Transpose (/decoder/Transpose_18)
Gather (/decoder/emb_9/Gather)
Expand (/decoder/attention_9/Expand)
Transpose (/decoder/attention_9/Transpose)
Concat (/decoder/attention_9/Concat)
MatMul (/decoder/attention/attn_9/MatMul)
Add (/decoder/attention/attn_9/Add)
Tanh (/decoder/attention_9/Tanh)
Softmax (/decoder/attention_9/Softmax)
MatMul (/decoder/MatMul_9)
Transpose (/decoder/Transpose_19)
Concat (/decoder/Concat_9)
GRU (/decoder/rnn_9/GRU)
LogSoftmax (/decoder/LogSoftmax_9)
Unsqueeze (/decoder/Unsqueeze_9)
Unsqueeze (/decoder/Unsqueeze_10)
Unsqueeze (/decoder/Unsqueeze_11)
Unsqueeze (/decoder/Unsqueeze_12)
Unsqueeze (/decoder/Unsqueeze_13)
Unsqueeze (/decoder/Unsqueeze_14)
Unsqueeze (/decoder/Unsqueeze_15)
Unsqueeze (/decoder/Unsqueeze_16)
Unsqueeze (/decoder/Unsqueeze_17)
Unsqueeze (/decoder/Unsqueeze_18)
Concat (/decoder/Concat_10)
Transpose (/decoder/Transpose_20)
FusedMatMul (MatMul_With_Transpose)
FusedMatMul (MatMul_With_Transpose_token_18)
FusedMatMul (MatMul_With_Transpose_token_19)
FusedMatMul (MatMul_With_Transpose_token_20)
FusedMatMul (MatMul_With_Transpose_token_21)
FusedMatMul (MatMul_With_Transpose_token_22)
FusedMatMul (MatMul_With_Transpose_token_23)
FusedMatMul (MatMul_With_Transpose_token_24)
FusedMatMul (MatMul_With_Transpose_token_25)

as you can see the difference is only on last 8 lines (matmuls token ids differs). Hope it'll help...

F

To reproduce

Look description.

Urgency

Urgent

Platform

Linux

OS Version

Ubuntu 22.04

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

1.17.1 release

ONNX Runtime API

C++

Architecture

X64

Execution Provider

OpenVINO

Execution Provider Library Version

2023.3

The text was updated successfully, but these errors were encountered:

debugmenot · 2024-03-19T05:37:25Z

Just to note: the issue looks independent of OpenVino version - got experiments with different. Also built all from scratch many times on different systems - same results.

debugmenot · 2024-03-19T09:21:43Z

Update: 1.14.1 also works, but the performance is about 10-15% lower. 1.15 and higher affected by issue.

jywu-msft · 2024-03-20T16:18:29Z

+@sfatimar, @preetha-intel

debugmenot · 2024-04-01T18:20:03Z

any update?

sfatimar · 2024-04-02T08:11:28Z

Can we have access to the model. It seems there are 11 subgraphs being formed and 167 nodes are being placed on CPUEP. But it is hard to debug without the model.

debugmenot · 2024-04-09T23:09:52Z

@sfatimar

dumbmodel.onnx.zip
Dumb model is in attachment.
To visualize issue there is also small log of test run:

Here i'm iterating over the same image. All result except first are broken.

`f1race@build_server_nvidia:/opt/ort_dev$ ./test --image images/test/dumb100x100text.jpg
[info] Wellcome to first 0.0.1
[info] Available provider: CUDAExecutionProvider
[info] Available provider: OpenVINOExecutionProvider
[info] Available provider: XnnpackExecutionProvider
[info] Available provider: CPUExecutionProvider
[-] Selected provider: OpenVINOExecutionProvider
Input 0 : name=input.1
Output 0 : name=1389
[-] Output tensor element count: 390
[info] CHAR: A, CLASS: 13, CONF: -0.11442014
[info] CHAR: A, CLASS: 13, CONF: -0.5359584
[info] CHAR: 4, CLASS: 7, CONF: -2.073846
[info] CHAR: 6, CLASS: 9, CONF: -2.010087
[info] CHAR: 6, CLASS: 9, CONF: -1.8180711
[info] CHAR: D, CLASS: 16, CONF: -2.448421
[info] CHAR: S, CLASS: 31, CONF: -2.7345552
[info] CHAR: , CLASS: 2, CONF: -0.009441723
[info] CHAR: , CLASS: 2, CONF: -0.05160664
[info] CHAR: , CLASS: 2, CONF: -0.097647004

[-] Output tensor element count: 390
[info] CHAR: A, CLASS: 13, CONF: -0.11442014
[info] CHAR: B, CLASS: 14, CONF: -2.1106374
[info] CHAR: , CLASS: 2, CONF: -2.3829944
[info] CHAR: , CLASS: 0, CONF: -0.31160322
[info] CHAR: , CLASS: 0, CONF: -2.2568073
[info] CHAR: , CLASS: 0, CONF: -2.5611315
[info] CHAR: , CLASS: 0, CONF: -2.2948604
[info] CHAR: , CLASS: 0, CONF: -2.2516015
[info] CHAR: , CLASS: 0, CONF: -2.5611215
[info] CHAR: , CLASS: 0, CONF: -2.294854

[-] Output tensor element count: 390
[info] CHAR: A, CLASS: 13, CONF: -0.11442014
[info] CHAR: B, CLASS: 14, CONF: -2.1106374
[info] CHAR: , CLASS: 2, CONF: -2.3829944
[info] CHAR: , CLASS: 0, CONF: -0.31160322
[info] CHAR: , CLASS: 0, CONF: -2.2568073
[info] CHAR: , CLASS: 0, CONF: -2.5611315
[info] CHAR: , CLASS: 0, CONF: -2.2948604
[info] CHAR: , CLASS: 0, CONF: -2.2516015
[info] CHAR: , CLASS: 0, CONF: -2.5611215
[info] CHAR: , CLASS: 0, CONF: -2.294854`

debugmenot · 2024-04-10T11:15:25Z

Once again, this happened ONLY with OpenVINO EP with Onnxruntime >= 1.15 and any version of OpenVino.

No issues with Onnxruntime 1.13.1 and 1.14.1 (lower not tested).

CPUEP, XnnpackEP, CudaEP works well with this model and same inference code in any version of ORT including the latest one.

henxing · 2024-04-15T17:42:45Z

I'm seeing a similar issue that occurs in Python with onnxruntime-openvino version 1.16.0. I am currently stuck using python 3.8, so I cannot test 1.17, but see the following for a test script with three very simple models that show how one of them (BrokenModel) generates different results than PyTorch when using onnxruntime. If this behavior is different enough from this issue, I'm happy to open another issue to track it.

import numpy as np
import onnxruntime as rt
import torch
from torch import nn


class BrokenModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv_1 = nn.Conv2d(3, 64, kernel_size=3, stride=1, padding=1)
        self.conv_2 = nn.Conv2d(64, 1, kernel_size=1, stride=1, padding=0)

    def forward(self, x):
        x = self.conv_1(x)
        output = self.conv_2(x)
        return output.mean(dim=(1, 2, 3))


class BatchMeanModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv_1 = nn.Conv2d(3, 64, kernel_size=3, stride=1, padding=1)
        self.conv_2 = nn.Conv2d(64, 1, kernel_size=1, stride=1, padding=0)

    def forward(self, x):
        x = self.conv_1(x)
        output = self.conv_2(x)
        return output.mean(dim=(1, 2, 3)), output.mean()


class FewChannelModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv_1 = nn.Conv2d(3, 3, kernel_size=3, stride=1, padding=1)
        self.conv_2 = nn.Conv2d(3, 1, kernel_size=1, stride=1, padding=0)

    def forward(self, x):
        x = self.conv_1(x)
        output = self.conv_2(x)
        return output.mean(dim=(1, 2, 3))


def run_model_pytorch_onnxruntime(arch, path):
    model = arch()
    model.eval()
    print("=" * 80)
    print(model)

    data = torch.ones(2, 3, 224, 224)
    data[0] *= 0

    print("Torch:")
    for _ in range(2):
        result = model(data)
        print(result)
    print()

    torch.onnx.export(
        model,
        data,
        path,
        input_names=["input"],
        output_names=["output"],
        export_params=True,
        dynamic_axes={name: {0: "batch_size"} for name in ("input", "output")},
        verbose=False,
    )

    sess_options = rt.SessionOptions()
    sess_options.graph_optimization_level = rt.GraphOptimizationLevel.ORT_DISABLE_ALL

    print("Onnxruntime:")
    rt_sess = rt.InferenceSession(
        path, sess_options, providers=["OpenVINOExecutionProvider"], provider_options=[{"device_id": "GPU"}]
    )
    for _ in range(2):
        outputs = rt_sess.run(None, {"input": data.numpy()})
        print(outputs)
    print()


if __name__ == "__main__":
    run_model_pytorch_onnxruntime(BrokenModel, "broken_model.onnx")
    print()
    run_model_pytorch_onnxruntime(BatchMeanModel, "batch_mean_model.onnx")
    print()
    run_model_pytorch_onnxruntime(FewChannelModel, "few_channel_model.onnx")

You'll need to install torch, onnxruntime-openvino, and numpy to run this script.

debugmenot · 2024-04-24T12:41:00Z

@sfatimar, Hi! Any updates? I've uploaded the model for bug investigation.

ankitm3k · 2024-04-25T06:26:42Z

Hi @debugmenot , I have tested the script suggested by @henxing using OpenVINO Toolkit v2024.1 (w_openvino_toolkit_windows_2024.1.0.dev20240405_x86_64) and OVEP v1.18.0 (this version update is now merged and available on the latest main of microsoft/onnruntime repo) on a Windows machine. I ran inference for 5 iterations and the PyTorch vs ORT OpenVINO EP results for every inference iterations were same and OVEP results were quite accurate upto 3 decimal precision against torch results. Please find the below run log for the same -

================================================================================
BrokenModel(
(conv_1): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(conv_2): Conv2d(64, 1, kernel_size=(1, 1), stride=(1, 1))
)
Torch:
tensor([-0.1026, -0.0569], grad_fn=)
tensor([-0.1026, -0.0569], grad_fn=)

Onnxruntime:
[array([-0.1026001 , -0.05670166], dtype=float32)]
[array([-0.1026001 , -0.05670166], dtype=float32)]
[array([-0.1026001 , -0.05670166], dtype=float32)]
[array([-0.1026001 , -0.05670166], dtype=float32)]
[array([-0.1026001 , -0.05670166], dtype=float32)]

================================================================================
BatchMeanModel(
(conv_1): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(conv_2): Conv2d(64, 1, kernel_size=(1, 1), stride=(1, 1))
)
Torch:
(tensor([0.1573, 0.1438], grad_fn=), tensor(0.1506, grad_fn=))
(tensor([0.1573, 0.1438], grad_fn=), tensor(0.1506, grad_fn=))

Onnxruntime:
[array([0.1573365 , 0.14381096], dtype=float32), array(0.15057378, dtype=float32)]
[array([0.1573365 , 0.14381096], dtype=float32), array(0.15057378, dtype=float32)]
[array([0.1573365 , 0.14381096], dtype=float32), array(0.15057378, dtype=float32)]
[array([0.1573365 , 0.14381096], dtype=float32), array(0.15057378, dtype=float32)]
[array([0.1573365 , 0.14381096], dtype=float32), array(0.15057378, dtype=float32)]

================================================================================
FewChannelModel(
(conv_1): Conv2d(3, 3, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(conv_2): Conv2d(3, 1, kernel_size=(1, 1), stride=(1, 1))
)
Torch:
tensor([-0.1036, -0.1638], grad_fn=)
tensor([-0.1036, -0.1638], grad_fn=)

Onnxruntime:
[array([-0.10357666, -0.16418457], dtype=float32)]
[array([-0.10357666, -0.16418457], dtype=float32)]
[array([-0.10357666, -0.16418457], dtype=float32)]
[array([-0.10357666, -0.16418457], dtype=float32)]
[array([-0.10357666, -0.16418457], dtype=float32)]

ankitm3k · 2024-04-25T06:51:25Z

@sfatimar

dumbmodel.onnx.zip Dumb model is in attachment. To visualize issue there is also small log of test run:

Here i'm iterating over the same image. All result except first are broken.

`f1race@build_server_nvidia:/opt/ort_dev$ ./test --image images/test/dumb100x100text.jpg [info] Wellcome to first 0.0.1 [info] Available provider: CUDAExecutionProvider [info] Available provider: OpenVINOExecutionProvider [info] Available provider: XnnpackExecutionProvider [info] Available provider: CPUExecutionProvider [-] Selected provider: OpenVINOExecutionProvider Input 0 : name=input.1 Output 0 : name=1389 [-] Output tensor element count: 390 [info] CHAR: A, CLASS: 13, CONF: -0.11442014 [info] CHAR: A, CLASS: 13, CONF: -0.5359584 [info] CHAR: 4, CLASS: 7, CONF: -2.073846 [info] CHAR: 6, CLASS: 9, CONF: -2.010087 [info] CHAR: 6, CLASS: 9, CONF: -1.8180711 [info] CHAR: D, CLASS: 16, CONF: -2.448421 [info] CHAR: S, CLASS: 31, CONF: -2.7345552 [info] CHAR: , CLASS: 2, CONF: -0.009441723 [info] CHAR: , CLASS: 2, CONF: -0.05160664 [info] CHAR: , CLASS: 2, CONF: -0.097647004

[-] Output tensor element count: 390 [info] CHAR: A, CLASS: 13, CONF: -0.11442014 [info] CHAR: B, CLASS: 14, CONF: -2.1106374 [info] CHAR: , CLASS: 2, CONF: -2.3829944 [info] CHAR: , CLASS: 0, CONF: -0.31160322 [info] CHAR: , CLASS: 0, CONF: -2.2568073 [info] CHAR: , CLASS: 0, CONF: -2.5611315 [info] CHAR: , CLASS: 0, CONF: -2.2948604 [info] CHAR: , CLASS: 0, CONF: -2.2516015 [info] CHAR: , CLASS: 0, CONF: -2.5611215 [info] CHAR: , CLASS: 0, CONF: -2.294854

[-] Output tensor element count: 390 [info] CHAR: A, CLASS: 13, CONF: -0.11442014 [info] CHAR: B, CLASS: 14, CONF: -2.1106374 [info] CHAR: , CLASS: 2, CONF: -2.3829944 [info] CHAR: , CLASS: 0, CONF: -0.31160322 [info] CHAR: , CLASS: 0, CONF: -2.2568073 [info] CHAR: , CLASS: 0, CONF: -2.5611315 [info] CHAR: , CLASS: 0, CONF: -2.2948604 [info] CHAR: , CLASS: 0, CONF: -2.2516015 [info] CHAR: , CLASS: 0, CONF: -2.5611215 [info] CHAR: , CLASS: 0, CONF: -2.294854`

We are investigating the issues faced while running your model using OpenVINO EP execution provider.

debugmenot · 2024-04-30T08:45:02Z

@ankitm3k hi! Did you confirm the bug? If so, any ETA for patch?

…t#19975)

ankitm3k · 2024-05-07T13:05:15Z

Hi @debugmenot,
I have investigated the issues with your given onnx model file i.e. dumbmodel.onnx. When performing inference with your model, there were many subgraph partitions with your model due to which most of the nodes were falling back to CPU EP. This causes lower performance as the model graph is completely not running with OpenVINO EP. The above fix enables the whole model to be supported on OpenVINOExecutionProvider and improves performance for your model.

I recommend you to use latest OpenVINO Toolkit v2024.1 along with the above patch to fix the same. I also have investigated the tensor outputs as a result of multiple inference iterations over the same input data and they were found to be consistent / accurate with the first inference results for my build.

…t#19975)

fix: updated data ops to support the complete graph on OVEP (microsoft#19975)

…t#19975)

debugmenot · 2024-08-13T23:52:45Z

@ankitm3k
Hi.
Update: issue still not fixed... Just checked. Performance is better now... but:
Onnxruntime 1.14.1 + OV:
[02:42:09.361] [I] [74706] [4] [car] HOMEP: T454BE199
[02:42:11.675] [I] [74706] [6] [car] HOMEP: X212EX197
[02:42:14.785] [I] [74706] [13] [car] HOMEP: O353XM199
[02:42:16.420] [I] [74706] [16] [car] HOMEP: H002XC199
[02:42:17.709] [I] [74706] [18] [car] HOMEP: P346AB197
[02:42:18.525] [I] [74706] [20] [car] HOMEP: A001OT197
[02:42:19.709] [I] [74706] [21] [car] HOMEP: E072MK199
[02:42:21.144] [I] [74706] [23] [car] HOMEP: B797HK197
[02:42:22.028] [I] [74706] [25] [car] HOMEP: O369CX177
[02:42:24.947] [I] [74706] [30] [car] HOMEP: B410KA17
[02:42:25.968] [I] [74706] [33] [car] HOMEP: K558AT197
[02:42:36.141] [I] [74706] [52] [car] HOMEP: C159XT199
[02:42:41.442] [I] [74706] [60] [car] HOMEP: O905OT190
[02:42:43.093] [I] [74706] [63] [car] HOMEP: Y902OA190
[02:42:46.568] [I] [74706] [68] [car] HOMEP: E159YY150
[02:42:47.770] [I] [74706] [71] [car] HOMEP: M181YA197

Onnxruntime 1.18.1 + OpenVinoEP 2024.3 + your GRU OP Patch:
[01:58:34.495] [I] [7342] [6] [car] HOMEP: T454BE199
[01:58:36.900] [I] [7342] [10] [car] HOMEP: X2XXX22X22
[01:58:39.927] [I] [7342] [20] [car] HOMEP: O333O33O33
[01:58:41.637] [I] [7342] [23] [car] HOMEP: H000H00H00
[01:58:42.832] [I] [7342] [27] [car] HOMEP: P333P33P33
[01:58:43.725] [I] [7342] [29] [car] HOMEP: A000A00A00
[01:58:44.849] [I] [7342] [30] [car] HOMEP: E000E00E00
[01:58:46.330] [I] [7342] [34] [car] HOMEP: B777B77B77
[01:58:51.137] [I] [7342] [51] [car] HOMEP: K555K55K55
[01:59:01.337] [I] [7342] [63] [car] HOMEP: C1CCC11C11
[01:59:06.587] [I] [7342] [70] [car] HOMEP: O999O99O99
[01:59:08.301] [I] [7342] [74] [car] HOMEP: Y999Y99Y99
[01:59:11.775] [I] [7342] [80] [car] HOMEP: E1EEE11E11
[01:59:13.006] [I] [7342] [85] [car] HOMEP: M111M11M11

i can prepare test project (source+model+image) for you. can you share your email please?

debugmenot · 2024-08-14T12:41:21Z

But with patch behaviour is slightly different - results after first is a little bit differs with results without patch but looks approx the same (incorrect)...
Is there a dirty fix possible if i change supported ops in data_ops.cc to 1.14.1 version, or something like this? How to do this properly? Cant use legacy ort versions in new build of our software because of API incompatibility.

debugmenot · 2024-08-14T14:23:47Z

@ankitm3k I've finally found an issue, at least WHERE it is EXACTLY. If
//{"Unsqueeze", V_2020_4, {"CPU", "GPU"}}, //is commented in data_ops.cc

all works as expected :) Issue needs an investigation.

its strange because unsqueeze is defined exactly same way as in 1.14.1 and 1.13.1 versions...

github-actions bot added ep:CUDA issues related to the CUDA execution provider ep:OpenVINO issues related to OpenVINO execution provider labels Mar 19, 2024

hariharans29 removed the ep:CUDA issues related to the CUDA execution provider label Mar 19, 2024

henxing mentioned this issue Apr 17, 2024

[Performance] OpenVINO EP Produces incorrect inference results #20357

Closed

ankitm3k added a commit to ankitm3k/onnxruntime that referenced this issue May 7, 2024

fix: updated data ops to support the complete graph on OVEP (microsof…

5d4264f

…t#19975)

ankitm3k mentioned this issue May 7, 2024

fix: updated data ops to support the complete graph on OVEP intel/onnxruntime#371

Closed

ankitm3k added a commit to intel/onnxruntime that referenced this issue May 22, 2024

fix: updated data ops to support the complete graph on OVEP (microsof…

73b41eb

…t#19975)

ankitm3k mentioned this issue May 22, 2024

fix: updated data ops to support the complete graph on OVEP intel/onnxruntime#374

Closed

ankitm3k added a commit to intel/onnxruntime that referenced this issue May 22, 2024

fix: updated data ops to support the complete graph on OVEP (microsof…

58cb2d7

…t#19975)

ankitm3k added a commit to intel/onnxruntime that referenced this issue May 29, 2024

fix: updated data ops to support the complete graph on OVEP (microsof…

dfff2f8

…t#19975)

ankitm3k added a commit to intel/onnxruntime that referenced this issue May 29, 2024

fix: updated data ops to support the complete graph on OVEP (microsof…

3b1f242

…t#19975)

ankitm3k added a commit to intel/onnxruntime that referenced this issue Jun 5, 2024

fix: updated data ops to support the complete graph on OVEP (microsof…

64d2fe1

…t#19975)

ankitm3k mentioned this issue Jun 5, 2024

fix: updated data ops to support the complete graph on OVEP (#19975) intel/onnxruntime#382

Merged

sfatimar added a commit to intel/onnxruntime that referenced this issue Jun 24, 2024

Merge pull request #382 from intel/ankit/data_ops_changes

cca5e99

fix: updated data ops to support the complete graph on OVEP (microsoft#19975)

saurabhkale17 pushed a commit to intel/onnxruntime that referenced this issue Jul 1, 2024

fix: updated data ops to support the complete graph on OVEP (microsof…

c9aed31

…t#19975)

ankitm3k added a commit to intel/onnxruntime that referenced this issue Jul 11, 2024

fix: updated data ops to support the complete graph on OVEP (microsof…

11b63c6

…t#19975)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] [OpenVino EP] Only first result in session is correct. #19975

[BUG] [OpenVino EP] Only first result in session is correct. #19975

debugmenot commented Mar 19, 2024 •

edited

Loading

debugmenot commented Mar 19, 2024

debugmenot commented Mar 19, 2024 •

edited

Loading

jywu-msft commented Mar 20, 2024

debugmenot commented Apr 1, 2024

sfatimar commented Apr 2, 2024

debugmenot commented Apr 9, 2024

debugmenot commented Apr 10, 2024

henxing commented Apr 15, 2024

debugmenot commented Apr 24, 2024

ankitm3k commented Apr 25, 2024 •

edited

Loading

ankitm3k commented Apr 25, 2024

debugmenot commented Apr 30, 2024

ankitm3k commented May 7, 2024 •

edited

Loading

debugmenot commented Aug 13, 2024 •

edited

Loading

debugmenot commented Aug 14, 2024

debugmenot commented Aug 14, 2024 •

edited

Loading

[BUG] [OpenVino EP] Only first result in session is correct. #19975

[BUG] [OpenVino EP] Only first result in session is correct. #19975

Comments

debugmenot commented Mar 19, 2024 • edited Loading

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

debugmenot commented Mar 19, 2024

debugmenot commented Mar 19, 2024 • edited Loading

jywu-msft commented Mar 20, 2024

debugmenot commented Apr 1, 2024

sfatimar commented Apr 2, 2024

debugmenot commented Apr 9, 2024

debugmenot commented Apr 10, 2024

henxing commented Apr 15, 2024

debugmenot commented Apr 24, 2024

ankitm3k commented Apr 25, 2024 • edited Loading

ankitm3k commented Apr 25, 2024

debugmenot commented Apr 30, 2024

ankitm3k commented May 7, 2024 • edited Loading

debugmenot commented Aug 13, 2024 • edited Loading

debugmenot commented Aug 14, 2024

debugmenot commented Aug 14, 2024 • edited Loading

debugmenot commented Mar 19, 2024 •

edited

Loading

debugmenot commented Mar 19, 2024 •

edited

Loading

ankitm3k commented Apr 25, 2024 •

edited

Loading

ankitm3k commented May 7, 2024 •

edited

Loading

debugmenot commented Aug 13, 2024 •

edited

Loading

debugmenot commented Aug 14, 2024 •

edited

Loading