CoreML EP inference result is improperly scaled #21170

frenetj · 2024-06-25T19:58:07Z

Describe the issue

When running inference of a specific dynamic-shape image filter model using CoreML EP, output pixels are slightly shifted towards the bottom left of the image. Pixels at the bottom left are not shifted at all, while pixels at the top right are shifted by almost a whole pixel to the left & downwards.

I cannot reproduce the issue with small images (size of ~1024 pixels or less. The issue is quite apparent using a 2048x2048 colour noise as input.

Here the top right portion of the input and output images:

Here is the shift over the hole image (absolute difference of the input vs output pixels). Notice the shift is present in the hole image, but more pronounced in the top right area:

I will provide the specific model to Microsoft directly as it has some proprietary content.

I cannot reproduce this issue when using macOS' native CPU handler. The issue is also NOT reproducible when using the CUDA or TensorRT handlers on linux. The issue is also NOT reproducible with macOS's CoreML EP when setting the COREML_FLAG_USE_CPU_ONLY flag.

Note that I am however using the COREML_FLAG_ONLY_ALLOW_STATIC_INPUT_SHAPES flag. I am thus surprised to see rendering difference with the CPU implementation since the model uses dynamic shapes and should thus NOT run using CoreML.

To reproduce

On macOS, setup the CoreML EP with the COREML_FLAG_ONLY_ALLOW_STATIC_INPUT_SHAPES flag.
Run the inference using the given model on a 2048x2048 image.
Notice that the output pixels are shifted to the left and towards the bottom of the image.

Urgency

The issue is not urgent as we are currently using the native CPU implementation.

Platform

Mac

OS Version

Sonoma 14.5

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

1.18.0

ONNX Runtime API

C

Architecture

ARM64

Execution Provider

CoreML

Execution Provider Library Version

No response

skottmckay · 2024-06-25T23:45:44Z

COREML_FLAG_USE_CPU_ONLY results in CoreML executing the same nodes using its reference CPU implementation. We set this as the MLModelConfiguration.computeUnits. The rest of the ORT CoreML EP code runs exactly the same. That would strongly suggest an issue with the internal CoreML handling of a large input when running on GPU/NPU.

COREML_FLAG_ONLY_ALLOW_STATIC_INPUT_SHAPES is applied on a per-node basis. Parts of the model may have fixed shapes leading to CoreML executing them. If you set the session logging severity to VERBOSE it will print out details of which nodes are/aren't assigned to CoreML. That would at least narrow down which CoreML operator could be going wrong.

skottmckay · 2024-06-27T08:06:37Z

This appears to be a CoreML NeuralNetwork specific problem. There are only a few Div and Sub nodes assigned to CoreML as the rest have dynamic input shapes. Most of those produce the expected output.

There are 2 Div nodes (Div_185 and Div_143) that end up doing 2 / (2048 - 1) (one for the height and one for the width). For some reason the NeuralNetwork Div is somewhat inaccurate for this floating point operation.

Python as a reference (double precision):
2.0 / 2047.0 = 0.0009770395701025891

EP	Value name	Value
CPU EP	Mul_340	0.00097703957
CoreML NeuralNetwork	Mul_340	0.00097751617
CoreML ML Program	Mul_340	0.00097703957

That difference must become significant across all the other downstream operations in the model, leading to the output discrepancies. I would guess it comes down to floating point inaccuracies from 2 divided by a large number as to why smaller numbers for the height or width don't trigger the issue.

skottmckay · 2024-07-01T00:27:38Z

FWIW it's possible to get a good result from NeuralNetwork but the model would need to be updated and you might need some experimentation to figure out what works best.

If I scale down the input size value (the 2047 in this case) first, do the Div, and scale back up it's happy. Guessing it's due to the difference in floating point representation due to the range between '2' and '2047'.

e.g. scaling the 2047 by 1000 (arbitrarily chosen) would be a = 2047 / 1000, b = 2 / a, c = b * 1000

github-actions · 2024-07-31T15:00:44Z

This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

github-actions bot added ep:CUDA issues related to the CUDA execution provider ep:TensorRT issues related to TensorRT execution provider platform:mobile issues related to ONNX Runtime mobile; typically submitted using template labels Jun 25, 2024

skottmckay removed ep:CUDA issues related to the CUDA execution provider ep:TensorRT issues related to TensorRT execution provider labels Jun 26, 2024

sophies927 added the ep:CoreML issues related to CoreML execution provider label Jun 27, 2024

github-actions bot added the stale issues that have not been addressed in a while; categorized by a bot label Jul 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CoreML EP inference result is improperly scaled #21170

CoreML EP inference result is improperly scaled #21170

frenetj commented Jun 25, 2024

skottmckay commented Jun 25, 2024 •

edited

Loading

skottmckay commented Jun 27, 2024 •

edited

Loading

skottmckay commented Jul 1, 2024

github-actions bot commented Jul 31, 2024

CoreML EP inference result is improperly scaled #21170

CoreML EP inference result is improperly scaled #21170

Comments

frenetj commented Jun 25, 2024

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

skottmckay commented Jun 25, 2024 • edited Loading

skottmckay commented Jun 27, 2024 • edited Loading

skottmckay commented Jul 1, 2024

github-actions bot commented Jul 31, 2024

skottmckay commented Jun 25, 2024 •

edited

Loading

skottmckay commented Jun 27, 2024 •

edited

Loading