[Mobile] ONNXRuntime giving different outputs on NNAPI and CPU EP #19518

saketharsh · 2024-02-14T07:10:43Z

Describe the issue

ONNX Model MobileNetv2-FP32 gives different outputs on CPU and NNAPI EP.
We observed this behaviour on multiple different mobile phones, to name a few:

POCO X3 Pro(Android13)
Realme 8 Pro(Android 13)
Samsung S20+ 5G(Snapdragon)
OnePlus 7

The output Logits for the models are different on same sets of images:

Mobile : OnePlus 7
Model - GM 1901
Architecture - arm64-v8a
OS Version - Oxygen OS 12.1
Processor - Qualcomm Snapdragon 855 octa-core
Android Version - Android 12

Image containing only Black pixels generated using

fun createBlackBitmap(width: Int, height: Int): Bitmap {

    val bitmap = Bitmap.createBitmap(width, height, Bitmap.Config.ARGB_8888)
    val canvas = Canvas(bitmap)
    canvas.drawColor(Color.BLACK)

    return bitmap
}

Image of a GoldFish

Difference in Logits values can be seen in attached Logcat screenshot

To reproduce

Create Android Application with onnxruntime-android(1.16.3) as its dependency.
Add the above provided images along with the application and add following to your code.
Check out ORTApp.zip in order to get helper methods needed.

 var sessionOptionsNNAPI: SessionOptions = SessionOptions()
        sessionOptionsNNAPI.addNnapi()
        var sessionOptionsNormal: SessionOptions = SessionOptions()
        val environment = OrtEnvironment.getEnvironment()

        var sessionWithNNAPI = environment.createSession(
            context.assets.open("mobilenetv2_fp32.onnx").readBytes(),
            sessionOptionsNNAPI
        )
        var sessionWithoutNNAPI = environment.createSession(
            context.assets.open("mobilenetv2_fp32.onnx").readBytes(),
            sessionOptionsNormal
        )

        val blackImgBitmap = createBlackBitmap(224, 224)

        val blackImgData = preProcess(blackImgBitmap)

        val nnapiOutputString1 = runInference(sessionWithNNAPI, blackImgData)
        val cpuOutputString1 = runInference(sessionWithoutNNAPI, blackImgData)


        Log.d("OUT IMG1 NNAPI", nnapiOutputString1)
        Log.d("OUT IMG1 CPU", cpuOutputString1)

        val goldfishImgBitmap = ImageUtils.loadAndResizeImage(this, "goldfish.jpeg", 224, 224)

        val goldfishImgData = preProcess(goldfishImgBitmap)

        val nnapiOutputString2 = runInference(sessionWithNNAPI, goldfishImgData)
        val cpuOutputString2 = runInference(sessionWithoutNNAPI, goldfishImgData)


        Log.d("OUT IMG2 NNAPI", nnapiOutputString2)
        Log.d("OUT IMG2 CPU", cpuOutputString2)

ORTApp.zip

Urgency

HIGH

Platform

Android

OS Version

13

ONNX Runtime Installation

Released Package

Compiler Version (if 'Built from Source')

No response

Package Name (if 'Released Package')

onnxruntime-android

ONNX Runtime Version or Commit ID

1.16.3

ONNX Runtime API

Java/Kotlin

Architecture

ARM64

Execution Provider

Default CPU, NNAPI

Execution Provider Library Version

No response

The text was updated successfully, but these errors were encountered:

skottmckay · 2024-02-14T07:38:17Z

Is this a duplicate of #19507?

Hard to tell from a subset of output in a screenshot. Which of the 1000 results have the highest probability if you apply softmax to the output to convert to percentage probabilities?

Does the label with the highest probability differ between CPU and NNAPI?

Does the percentage probability for the best match change significantly?

We haven't changed anything about how NNAPI runs and I don't recall any other issue saying the output was completely incorrect. We test the NNAPI model generated for individual the operators using the same unit test code that tests the CPU EP implementations. If values were completely off those tests should fail.

Yash-Vardhan-Sharma · 2024-02-14T08:26:35Z

Does the label with the highest probability differ between CPU and NNAPI?

Yes

Does the percentage probability for the best match change significantly?

Yes

for the black image it was like ~3%(highest) probability on matchbox on CPU, on NNAPI it was something else with probability of ~0.17%.

saketharsh · 2024-02-14T09:04:03Z

@skottmckay , the outputs of NNAPI EP and CPU EP differ on a few devices and NOT all of them.

From our tests, were were able to replicate this issue while executing models with NNAPI using onnxruntime on devices with below mentioned chipsets :

Majority of the mobile devices where we were able to reproduce the issue were based these chipsets:

Qualcomm Snapdragon 865
Qualcomm Snapdragon 850
Qualcomm Snapdragon 870.
Qualcomm Snapdragon 855.

You can try replicating these issues on the mobile device mentioned using BrowserStack or LambdaTest

skottmckay · 2024-02-15T04:23:50Z

The black image doesn't seem like a good test. Neither 3% or 0.17% are good scores for a match - which isn't surprising as the model was not trained to detect black images.

What about the goldfish?

Different hardware will execute the model using different instructions/ordering. That results in differences in the floating point numbers. A matmul (used in Conv) is a lot of multiplications and additions. That may get batched differently or done in a different order.

https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html

Yash-Vardhan-Sharma · 2024-02-15T06:31:48Z

For goldfish too.
CPU EP outputs goldfish with 99.7% probability whereas NNAPI EP's top logit is unicycle/ping pong ball with 0.6%/0.4% probabilities.

Yash-Vardhan-Sharma · 2024-02-15T07:29:35Z

Moreover we experimented same thing with icons-50 model on the same mobile device as above (oneplus7 GM 1901)

Below are the corresponding outputs on NNAPI and CPU on ORT and tflite for the image below.

Clearly there is no disparity in outputs across EPs in tflite.
For ORT, the CPU EP also provides the same output as tflite but the NNAPI output differs vastly.
note: 0th index represents airplane.

Yash-Vardhan-Sharma · 2024-02-15T09:01:43Z

@skottmckay
have updated the comment above with outputs from onnx and tflite.

skottmckay · 2024-02-15T11:40:31Z

Not sure what's going on. I created an app locally and ran using the emulator and got the expected results.

The ORT code is agnostic to the device beyond asking what the NNAPI API level is to determine which NNAPI operators it can use. Mobilenet is a very well tested model and I believe would have been supported by NNAPI for a long time so I doubt the API level is changing the NNAPI model that is created.

What Android version is on the devices that give invalid results? Is that the same or different to 'good' devices?

You could try setting the log level to verbose to see if there's any other meaningful output:

sessionOptionsNNAPI.setSessionLogLevel(OrtLoggingLevel.ORT_LOGGING_LEVEL_VERBOSE)

There should be a message with 'Node placements' saying all nodes are assigned to the NNAPI EP.

Could also try the latest ORT version (1.17), but I don't believe there are any significant changes to how the NNAPI EP works in that.

Given you know the expected output for the goldfish it may also be worth stripping the code down to just one inference session running with the goldfish as input and NNAPI enabled. Could call runInference twice to see if the output is consistent or not.

### Description  A number of Qualcomm Snapdragon chipsets do not produce correct output if we skip the Reshape, which ironically was a performance optimization for Snapdragon chips. Perf testing showed that Squeeze also seems to execute on CPU so there's no benefit to using that as an alternative where possible e.g. Global*Pool -> Reshape to 2D -> Gemm could be potentially be replaced with Global*Pool -> Squeeze dims 2 and 3 -> Gemm if that offered better performance. ### Motivation and Context  #19518

github-actions · 2024-03-16T15:00:53Z

This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

saketharsh added the platform:mobile issues related to ONNX Runtime mobile; typically submitted using template label Feb 14, 2024

This was referenced Feb 20, 2024

[Mobile] NNAPI EP giving wrong outputs for certain devices. #19507

Closed

Remove skipping of Reshape from NNAPI EP #19618

Merged

github-actions bot added the stale issues that have not been addressed in a while; categorized by a bot label Mar 16, 2024

saketharsh closed this as completed Mar 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Mobile] ONNXRuntime giving different outputs on NNAPI and CPU EP #19518

[Mobile] ONNXRuntime giving different outputs on NNAPI and CPU EP #19518

saketharsh commented Feb 14, 2024 •

edited

Loading

skottmckay commented Feb 14, 2024

Yash-Vardhan-Sharma commented Feb 14, 2024

saketharsh commented Feb 14, 2024

skottmckay commented Feb 15, 2024

Yash-Vardhan-Sharma commented Feb 15, 2024

Yash-Vardhan-Sharma commented Feb 15, 2024 •

edited

Loading

Yash-Vardhan-Sharma commented Feb 15, 2024

skottmckay commented Feb 15, 2024

github-actions bot commented Mar 16, 2024

[Mobile] ONNXRuntime giving different outputs on NNAPI and CPU EP #19518

[Mobile] ONNXRuntime giving different outputs on NNAPI and CPU EP #19518

Comments

saketharsh commented Feb 14, 2024 • edited Loading

Describe the issue

To reproduce

Urgency

HIGH

Platform

OS Version

ONNX Runtime Installation

Compiler Version (if 'Built from Source')

Package Name (if 'Released Package')

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

skottmckay commented Feb 14, 2024

Yash-Vardhan-Sharma commented Feb 14, 2024

saketharsh commented Feb 14, 2024

skottmckay commented Feb 15, 2024

Yash-Vardhan-Sharma commented Feb 15, 2024

Yash-Vardhan-Sharma commented Feb 15, 2024 • edited Loading

Yash-Vardhan-Sharma commented Feb 15, 2024

skottmckay commented Feb 15, 2024

github-actions bot commented Mar 16, 2024

saketharsh commented Feb 14, 2024 •

edited

Loading

Yash-Vardhan-Sharma commented Feb 15, 2024 •

edited

Loading