Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug of Softmax op with different axis #17833

Closed
XiaotaoChen opened this issue Oct 8, 2023 · 4 comments
Closed

Bug of Softmax op with different axis #17833

XiaotaoChen opened this issue Oct 8, 2023 · 4 comments

Comments

@XiaotaoChen
Copy link

Describe the issue

we hava a seg model here https://github.com/Tunaaaaa/softmax_bug/blob/main/face_seg.onnx
we generate random input here https://github.com/Tunaaaaa/softmax_bug/blob/main/x.txt
the last op is Softmax with axis=1
when run this onnx model as follow, we got wrong result

To reproduce

use face_seg.onnx model and input txt with below script. more infomation see: onnx/onnx#5655

import numpy as np
import onnxruntime as rt


with open('x.txt', 'r') as f:
    data = [float(line.strip()) for line in f]


input_data = np.array(data, dtype=np.float32).reshape(1,3,224,128)


sess = rt.InferenceSession("face_seg.onnx")


input_name = sess.get_inputs()[0].name
output_names = [output.name for output in sess.get_outputs()]


results = sess.run(output_names, {input_name: input_data})


for i, output_name in enumerate(output_names):
    with open('out/' + output_name + '.txt', 'w') as f:
        for item in results[i].flatten():
            f.write("%s\n" % item)

Urgency

No response

Platform

Mac

OS Version

13.4.1 (c) (22F770820d)

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.15.1

ONNX Runtime API

Python

Architecture

X64

Execution Provider

Default CPU

Execution Provider Library Version

No response

@hariharans29
Copy link
Member

hariharans29 commented Oct 17, 2023

The operation you are getting the "right" result with and the Softmax in the model are not exactly the same operation.

You have an opset 11 model (side question - can this be updated ?), and the spec for opset-11 Softmax can be found here - https://onnx.ai/onnx/operators/onnx__Softmax.html#l-onnx-op-softmax-11.

Illustration (based on the spec):

Softmax-11 with axis=1 with 4-D data means that the input will be "coerced" into [d0, d1 * d2 * d3] and softmax will be computed on this "coerced" tensor.

Your "right result Python" implementation (based on numpy) is actually flattening the 4-D data into [d0 * d1 * d2 * d3] and softmax is performed on this "flat" list of data.

As you can see, they are not the same semantically. In fact, no value of axis for Softmax-11 will align with your "Python" implementation as no value of axis can make the data into Softmax-11 into [1, x] shape (that is why your previous steps failed too). You have actually found what your model is missing through your experimentation. I am guessing the exporter/converter is missing a Flatten operator (https://onnx.ai/onnx/operators/onnx__Flatten.html) with axis=0 leading into the Softmax with axis=1. This should align the data operation with your "right result Python" implementation.

@XiaotaoChen
Copy link
Author

The operation you are getting the "right" result with and the Softmax in the model are not exactly the same operation.

You have an opset 11 model (side question - can this be updated ?), and the spec for opset-11 Softmax can be found here - https://onnx.ai/onnx/operators/onnx__Softmax.html#l-onnx-op-softmax-11.

Illustration (based on the spec):

Softmax-11 with axis=1 with 4-D data means that the input will be "coerced" into [d0, d1 * d2 * d3] and softmax will be computed on this "coerced" tensor.

Your "right result Python" implementation (based on numpy) is actually flattening the 4-D data into [d0 * d1 * d2 * d3] and softmax is performed on this "flat" list of data.

As you can see, they are not the same semantically. In fact, no value of axis for Softmax-11 will align with your "Python" implementation as no value of axis can make the data into Softmax-11 into [1, x] shape (that is why your previous steps failed too). You have actually found what your model is missing through your experimentation. I am guessing the exporter/converter is missing a Flatten operator (https://onnx.ai/onnx/operators/onnx__Flatten.html) with axis=0 leading into the Softmax with axis=1. This should align the data operation with your "right result Python" implementation.

thanks. i know your means. i misunderstand the axis definition with opset-11. the source onnx model is right, the total subgraph is: [tranpose(nchw->nhwc), softmax(axis=3), transpose(nhwc->nchw)], this can implement softmax(axis=1). i tried to optimze the graph, And replace the [tranpose(nchw->nhwc), softmax(axis=3), transpose(nhwc->nchw)] to softmax(axis=1), and than counting this problem.

@hariharans29
Copy link
Member

The operation you are getting the "right" result with and the Softmax in the model are not exactly the same operation.
You have an opset 11 model (side question - can this be updated ?), and the spec for opset-11 Softmax can be found here - https://onnx.ai/onnx/operators/onnx__Softmax.html#l-onnx-op-softmax-11.
Illustration (based on the spec):
Softmax-11 with axis=1 with 4-D data means that the input will be "coerced" into [d0, d1 * d2 * d3] and softmax will be computed on this "coerced" tensor.
Your "right result Python" implementation (based on numpy) is actually flattening the 4-D data into [d0 * d1 * d2 * d3] and softmax is performed on this "flat" list of data.
As you can see, they are not the same semantically. In fact, no value of axis for Softmax-11 will align with your "Python" implementation as no value of axis can make the data into Softmax-11 into [1, x] shape (that is why your previous steps failed too). You have actually found what your model is missing through your experimentation. I am guessing the exporter/converter is missing a Flatten operator (https://onnx.ai/onnx/operators/onnx__Flatten.html) with axis=0 leading into the Softmax with axis=1. This should align the data operation with your "right result Python" implementation.

thanks. i know your means. i misunderstand the axis definition with opset-11. the source onnx model is right, the total subgraph is: [tranpose(nchw->nhwc), softmax(axis=3), transpose(nhwc->nchw)], this can implement softmax(axis=1). i tried to optimze the graph, And replace the [tranpose(nchw->nhwc), softmax(axis=3), transpose(nhwc->nchw)] to softmax(axis=1), and than counting this problem.

Then in that case, using Softmax with axis=1 should work from opset-13 because the opset-13 changed what "axis" means semantically in the Softmax family of ops - onnx/onnx#3466

@PINTO0309
Copy link

PINTO0309 commented Oct 17, 2024

Still, with opset=16, Softmax does not work properly unless the model is modified as shown below.

pip show onnx onnxruntime-gpu

Name: onnx
Version: 1.17.0
Summary: Open Neural Network Exchange
Home-page: 
Author: 
Author-email: ONNX Contributors <[email protected]>
License: Apache License v2.0
Location: /home/b920405/.local/lib/python3.10/site-packages
Requires: numpy, protobuf
Required-by: drpai_common, graph_assigner, graph_optimizer, graph_splitter, insightface, olive-ai, onnx-graphsurgeon, onnx-tool, onnxoptimizer, onnxscript, onnxsim, super-gradients, tf2onnx
---
Name: onnxruntime-gpu
Version: 1.18.1
Summary: ONNX Runtime is a runtime accelerator for Machine Learning models
Home-page: https://onnxruntime.ai
Author: Microsoft Corporation
Author-email: [email protected]
License: MIT License
Location: /home/b920405/.local/lib/python3.10/site-packages
Requires: coloredlogs, flatbuffers, numpy, packaging, protobuf, sympy
Required-by

Image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants