-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Mobile] android whisper onnx model gives out of memory error during runtime #19514
Comments
There's nothing in the implementation that allocates memory differently based on the model. The ORT code that runs the model is all C++ and knows nothing about the java heap. The ORT Java bindings are a thin layer which calls this C++ API The memory usage for a model is not 1:1 with the model size either. Data needs to be converted from the stored format to in-memory. Per-node optimizations may use more memory to change the layout of data so execution is more efficient. Running the model is most likely going to require more memory as well. Add to that it will take a lot more processing to run a model 2x as large.
|
thank you for your reply.
and this is the dumpsys output for my tiny model:
and I still can't understand why tiny is using just 6mb of heap but 260mb of native whereas base is using 140mb of heap but 25mb of native. |
I would suggest stepping through it in a debugger to find where it runs out of memory. Is it loading the model bytes in Java here? Or when attempting to create the inference session? From the look of it the model bytes are loaded in Java first, an InferenceSession is created which will allocate native memory for the model (it parses the input bytes to load the model) and whatever else is required to execute it, and I assume after that the model bytes in Java are freed when they go out of scope. If you were only able to load the model bytes in Java and the inference session creation failed, it might make sense you have 140MB of heap and not a lot of native if the Java allocated model bytes haven't been freed yet. |
it seems to me that its failing at the loading 'model bytes' in java step itself and even when garbage collector is trying to free up space, that much heap is just not available. is it because its loading model bytes and not using memory mapping to load the model? |
I'm not sure what the behaviour of the JNI interface methods are in Android, but it may copy the bytes from Java into JNI before handing them to the ORT session constructor (that's allowable under the spec, and it's not clear what OpenJDK does), so there may temporarily be three copies of the model (one in Java, one in JNI and one in the ORT session constructor). TFlite may pass through a reference and thus only have a single copy. |
is there a way to just have one copy instead of three at any given point in time so that it doesn't consume that much heap memory? |
If there's enough of a filesystem you can ask ORT to load it from the file path. I'm not familiar enough with Android to know if that works. Otherwise we'd need code changes in ORT's Java layer to expose a byte buffer end point to allow it to be passed through to ORT without a copy on the JNI side. Using |
asking ORT to load from file path as a string instead of a byte array in createSession? |
Yes, as ORT will do the load itself in that case so Java never has a copy of the model. |
am facing an interesting error here. do i need to make changes to the way audio file is loaded to ort as inputs? |
I looked through the example code and that appears to make float tensors, so I'm a bit confused as to how this works in the first place if the model is expecting an uint8 input. |
my bad, i was using a model which was generated using --no_audio_decoder option disabled. I was able to make it to work after following your suggestion. |
Describe the issue
I am using the android local example sourced from https://github.com/microsoft/onnxruntime-inference-examples/tree/main/mobile/examples/whisper/local/android
I created the whisper-base onnx model using the steps mentioned here: https://github.com/microsoft/onnxruntime-inference-examples/tree/main/mobile/examples/whisper/local/android#generate-the-model
I created whisper-base_cpu_int8.onnx model with --no_audio_decoder enabled.
I am using an aarch64 android mobile device using android 12 with 2GB RAM
when i use the android app example with the whisper_tiny_cpu_int8_model.onnx as provided in example by default, it works without issues. the default model size if around 74MB.
When i replace this model with whisper_base_cpu_int8_model.onnx, which has a size of 140MB, it fails with out of memory error as it tries to access java heap memory instead of native memory.
All i did was change the model, i did not change the code anywhere. while running the app, i still have 500MB of free RAM available and still i get out of memory error.
the error log is below:
How can I proceed to debug this? I should not be getting any memory error here. The part of app which should access native heap is accessing java heap. How to undo this?
To reproduce
model is here: https://drive.google.com/file/d/1mrtEbq4PGcfppTVfGK2wMqRfniBivtB5/view?usp=sharing
build the application on android studio using: https://github.com/microsoft/onnxruntime-inference-examples/tree/main/mobile/examples/whisper/local/android and replace the model in android/app/src/main/res/raw directory from tiny to base and build the application. Execute on the device.
it will give OutOfMemory error
Urgency
urgent
Platform
Android
OS Version
android 12
ONNX Runtime Installation
Released Package
Compiler Version (if 'Built from Source')
No response
Package Name (if 'Released Package')
onnxruntime-android
ONNX Runtime Version or Commit ID
1.17.0
ONNX Runtime API
Java/Kotlin
Architecture
ARM64
Execution Provider
Default CPU
Execution Provider Library Version
No response
Tasks
Tasks
The text was updated successfully, but these errors were encountered: