Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Mobile] ONNXRuntime giving different outputs on NNAPI and CPU EP #19518

Closed
saketharsh opened this issue Feb 14, 2024 · 9 comments
Closed

[Mobile] ONNXRuntime giving different outputs on NNAPI and CPU EP #19518

saketharsh opened this issue Feb 14, 2024 · 9 comments
Labels
platform:mobile issues related to ONNX Runtime mobile; typically submitted using template stale issues that have not been addressed in a while; categorized by a bot

Comments

@saketharsh
Copy link

saketharsh commented Feb 14, 2024

Describe the issue

ONNX Model MobileNetv2-FP32 gives different outputs on CPU and NNAPI EP.
We observed this behaviour on multiple different mobile phones, to name a few:

  1. POCO X3 Pro(Android13)
  2. Realme 8 Pro(Android 13)
  3. Samsung S20+ 5G(Snapdragon)
  4. OnePlus 7

The output Logits for the models are different on same sets of images:

Mobile : OnePlus 7
Model - GM 1901
Architecture - arm64-v8a
OS Version - Oxygen OS 12.1
Processor - Qualcomm Snapdragon 855 octa-core
Android Version - Android 12

  1. Image containing only Black pixels generated using
fun createBlackBitmap(width: Int, height: Int): Bitmap {

    val bitmap = Bitmap.createBitmap(width, height, Bitmap.Config.ARGB_8888)
    val canvas = Canvas(bitmap)
    canvas.drawColor(Color.BLACK)

    return bitmap
}
  1. Image of a GoldFish

Difference in Logits values can be seen in attached Logcat screenshot
Screenshot 2024-02-13 at 5 44 18 PM

To reproduce

  1. Create Android Application with onnxruntime-android(1.16.3) as its dependency.
  2. Add the above provided images along with the application and add following to your code.
  3. Check out ORTApp.zip in order to get helper methods needed.
 var sessionOptionsNNAPI: SessionOptions = SessionOptions()
        sessionOptionsNNAPI.addNnapi()
        var sessionOptionsNormal: SessionOptions = SessionOptions()
        val environment = OrtEnvironment.getEnvironment()

        var sessionWithNNAPI = environment.createSession(
            context.assets.open("mobilenetv2_fp32.onnx").readBytes(),
            sessionOptionsNNAPI
        )
        var sessionWithoutNNAPI = environment.createSession(
            context.assets.open("mobilenetv2_fp32.onnx").readBytes(),
            sessionOptionsNormal
        )

        val blackImgBitmap = createBlackBitmap(224, 224)

        val blackImgData = preProcess(blackImgBitmap)

        val nnapiOutputString1 = runInference(sessionWithNNAPI, blackImgData)
        val cpuOutputString1 = runInference(sessionWithoutNNAPI, blackImgData)


        Log.d("OUT IMG1 NNAPI", nnapiOutputString1)
        Log.d("OUT IMG1 CPU", cpuOutputString1)

        val goldfishImgBitmap = ImageUtils.loadAndResizeImage(this, "goldfish.jpeg", 224, 224)

        val goldfishImgData = preProcess(goldfishImgBitmap)

        val nnapiOutputString2 = runInference(sessionWithNNAPI, goldfishImgData)
        val cpuOutputString2 = runInference(sessionWithoutNNAPI, goldfishImgData)


        Log.d("OUT IMG2 NNAPI", nnapiOutputString2)
        Log.d("OUT IMG2 CPU", cpuOutputString2)

ORTApp.zip

Urgency

HIGH

Platform

Android

OS Version

13

ONNX Runtime Installation

Released Package

Compiler Version (if 'Built from Source')

No response

Package Name (if 'Released Package')

onnxruntime-android

ONNX Runtime Version or Commit ID

1.16.3

ONNX Runtime API

Java/Kotlin

Architecture

ARM64

Execution Provider

Default CPU, NNAPI

Execution Provider Library Version

No response

@saketharsh saketharsh added the platform:mobile issues related to ONNX Runtime mobile; typically submitted using template label Feb 14, 2024
@skottmckay
Copy link
Contributor

Is this a duplicate of #19507?

Hard to tell from a subset of output in a screenshot. Which of the 1000 results have the highest probability if you apply softmax to the output to convert to percentage probabilities?

Does the label with the highest probability differ between CPU and NNAPI?

Does the percentage probability for the best match change significantly?

We haven't changed anything about how NNAPI runs and I don't recall any other issue saying the output was completely incorrect. We test the NNAPI model generated for individual the operators using the same unit test code that tests the CPU EP implementations. If values were completely off those tests should fail.

@Yash-Vardhan-Sharma
Copy link

Does the label with the highest probability differ between CPU and NNAPI?

  • Yes

Does the percentage probability for the best match change significantly?

  • Yes

for the black image it was like ~3%(highest) probability on matchbox on CPU, on NNAPI it was something else with probability of ~0.17%.

@saketharsh
Copy link
Author

@skottmckay , the outputs of NNAPI EP and CPU EP differ on a few devices and NOT all of them.

From our tests, were were able to replicate this issue while executing models with NNAPI using onnxruntime on devices with below mentioned chipsets :

Majority of the mobile devices where we were able to reproduce the issue were based these chipsets:

  1. Qualcomm Snapdragon 865
  2. Qualcomm Snapdragon 850
  3. Qualcomm Snapdragon 870.
  4. Qualcomm Snapdragon 855.

You can try replicating these issues on the mobile device mentioned using BrowserStack or LambdaTest

@skottmckay
Copy link
Contributor

The black image doesn't seem like a good test. Neither 3% or 0.17% are good scores for a match - which isn't surprising as the model was not trained to detect black images.

What about the goldfish?

Different hardware will execute the model using different instructions/ordering. That results in differences in the floating point numbers. A matmul (used in Conv) is a lot of multiplications and additions. That may get batched differently or done in a different order.

https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html

@Yash-Vardhan-Sharma
Copy link

For goldfish too.
CPU EP outputs goldfish with 99.7% probability whereas NNAPI EP's top logit is unicycle/ping pong ball with 0.6%/0.4% probabilities.

@Yash-Vardhan-Sharma
Copy link

Yash-Vardhan-Sharma commented Feb 15, 2024

Moreover we experimented same thing with icons-50 model on the same mobile device as above (oneplus7 GM 1901)

Below are the corresponding outputs on NNAPI and CPU on ORT and tflite for the image below.

apple_0_airplane

Screenshot 2024-02-15 at 2 22 20 PM

image

Clearly there is no disparity in outputs across EPs in tflite.
For ORT, the CPU EP also provides the same output as tflite but the NNAPI output differs vastly.
note: 0th index represents airplane.

@Yash-Vardhan-Sharma
Copy link

@skottmckay
have updated the comment above with outputs from onnx and tflite.

@skottmckay
Copy link
Contributor

Not sure what's going on. I created an app locally and ran using the emulator and got the expected results.

image

The ORT code is agnostic to the device beyond asking what the NNAPI API level is to determine which NNAPI operators it can use. Mobilenet is a very well tested model and I believe would have been supported by NNAPI for a long time so I doubt the API level is changing the NNAPI model that is created.

What Android version is on the devices that give invalid results? Is that the same or different to 'good' devices?

You could try setting the log level to verbose to see if there's any other meaningful output:

sessionOptionsNNAPI.setSessionLogLevel(OrtLoggingLevel.ORT_LOGGING_LEVEL_VERBOSE)

There should be a message with 'Node placements' saying all nodes are assigned to the NNAPI EP.

Could also try the latest ORT version (1.17), but I don't believe there are any significant changes to how the NNAPI EP works in that.

Given you know the expected output for the goldfish it may also be worth stripping the code down to just one inference session running with the goldfish as input and NNAPI enabled. Could call runInference twice to see if the output is consistent or not.

skottmckay added a commit that referenced this issue Feb 27, 2024
### Description
<!-- Describe your changes. -->
A number of Qualcomm Snapdragon chipsets do not produce correct output
if we skip the Reshape, which ironically was a performance optimization
for Snapdragon chips.

Perf testing showed that Squeeze also seems to execute on CPU so there's
no benefit to using that as an alternative where possible e.g.
Global*Pool -> Reshape to 2D -> Gemm could be potentially be replaced
with Global*Pool -> Squeeze dims 2 and 3 -> Gemm if that offered better
performance.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
#19518
Copy link
Contributor

This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

@github-actions github-actions bot added the stale issues that have not been addressed in a while; categorized by a bot label Mar 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
platform:mobile issues related to ONNX Runtime mobile; typically submitted using template stale issues that have not been addressed in a while; categorized by a bot
Projects
None yet
Development

No branches or pull requests

3 participants