Graph optimization removes significant casts even if all optimizations are disabled #17565

cbourjau · 2023-09-15T11:11:22Z

Describe the issue

Consider the following model

where a is a float64 input, the first cast is to float32 and the second one is back to float64. The model is stored in original_model.onnx below.

Running

    import onnxruntime as ort

    opts = ort.SessionOptions()
    opts.graph_optimization_level = ort.GraphOptimizationLevel.ORT_DISABLE_ALL
    opts.optimized_model_filepath = "model_disable_all_optimizations.onnx"

    ort.InferenceSession("original_model.onnx", opts)

yields the following model:

The cast are significant. Removing them changes the output of the model. Furthermore, I would expect no optimizations at all to take place if opts.graph_optimization_level = ort.GraphOptimizationLevel.ORT_DISABLE_ALL.

To reproduce

The respective models are attached.

models.zip

Urgency

The (disabled) optimization produces wrong results in surprising ways. Users may rely on the truncations from the cast. We found this bug because the lack of these casts produced wrong results in a subsequent TreeEnsembleRegressor.

Platform

Mac

OS Version

12.3.1

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.15.1

ONNX Runtime API

Python

Architecture

ARM64

Execution Provider

Default CPU

Execution Provider Library Version

No response

The text was updated successfully, but these errors were encountered:

wangyems · 2023-09-15T22:12:41Z

there is a way to disable certain optimizers: e.g

onnxruntime/onnxruntime/python/tools/transformers/optimizer.py

Line 138 in efd416b

kwargs["disabled_optimizers"] = disabled_optimizers

tianleiwu · 2023-09-18T04:55:55Z

The InsertCastTransformer cannot be disabled right now:

onnxruntime/onnxruntime/core/session/inference_session.cc

Line 1066 in c969237

    
           InsertCastTransformer insert_cast_transformer{"CastFloat16Transformer", cpu_regs};

There are some similar issues reported related to this: #8787.

Solution is to update the logic for precision loss detection here:

onnxruntime/onnxruntime/core/optimizer/insert_cast_transformer.cc

Line 303 in c969237

if (src_type_group > dst_type_group) {

and/or disable RemoveDuplicateCastTransforme when all optimization are disabled.

github-actions · 2023-10-31T15:01:04Z

This issue has been automatically marked as stale due to inactivity and will be closed in 7 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

cbourjau · 2023-10-31T15:36:45Z

This issue has been automatically marked as stale due to inactivity and will be closed in 7 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

The issue is not resolved, albeit it might be a duplicate.

The `RemoveDuplicateCastTransformer` fairly naively removed Cast nodes from the graph without considering precision loss when using the same `TypeGroup`. For instance, F64 -> F32 -> F64 would be optimised out of the graph. I also noticed that signedness was not accounted for, which is not covered by any existing issue but is a problem. For example doing int -> unsigned int -> int produces very different values for negative inputs and so should not be optimised out One could argue that we shouldn't be performing such cast elimination at all (at least not in this transformer). The original scope might be well restricted to only eliminating unnecessary casts from the `InsertCastTransformer` and no others. ### Motivation and Context This should fix #17565, ttps://github.com//issues/9915 and #8787.

The `RemoveDuplicateCastTransformer` fairly naively removed Cast nodes from the graph without considering precision loss when using the same `TypeGroup`. For instance, F64 -> F32 -> F64 would be optimised out of the graph. I also noticed that signedness was not accounted for, which is not covered by any existing issue but is a problem. For example doing int -> unsigned int -> int produces very different values for negative inputs and so should not be optimised out One could argue that we shouldn't be performing such cast elimination at all (at least not in this transformer). The original scope might be well restricted to only eliminating unnecessary casts from the `InsertCastTransformer` and no others. ### Motivation and Context This should fix microsoft#17565, ttps://github.com/microsoft/issues/9915 and microsoft#8787.

adityagoel4512 mentioned this issue Oct 15, 2023

Fix cast removal bug #17953

Merged

github-actions bot added the stale issues that have not been addressed in a while; categorized by a bot label Oct 31, 2023

tianleiwu closed this as completed in #17953 Oct 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Graph optimization removes significant casts even if all optimizations are disabled #17565

Graph optimization removes significant casts even if all optimizations are disabled #17565

cbourjau commented Sep 15, 2023

wangyems commented Sep 15, 2023

tianleiwu commented Sep 18, 2023

github-actions bot commented Oct 31, 2023

cbourjau commented Oct 31, 2023

Graph optimization removes significant casts even if all optimizations are disabled #17565

Graph optimization removes significant casts even if all optimizations are disabled #17565

Comments

cbourjau commented Sep 15, 2023

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

wangyems commented Sep 15, 2023

tianleiwu commented Sep 18, 2023

github-actions bot commented Oct 31, 2023

cbourjau commented Oct 31, 2023