[Mobile] Segmentation fault after repeated inference #21082

laurenspriem · 2024-06-18T16:54:34Z

Describe the issue

I am getting a segmentation fault (SIGSEGV) after repeated inference runs on mobile, crashing the app. The issue only comes up after running inference for more than 300 times, but it consistently comes up after that. For context, I am using ORT in a Flutter app through FFI.

Error logs

06-18 16:11:46.395 30309 30624 F libc    : Fatal signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x0 in tid 30624 (2.ui), pid 30309 (tos.independent)
06-18 16:11:47.548 31118 31118 F DEBUG   : *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
06-18 16:11:47.548 31118 31118 F DEBUG   : CalyxOS version: '5.7.2'
06-18 16:11:47.548 31118 31118 F DEBUG   : Build incremental version: '24507020'
06-18 16:11:47.549 31118 31118 F DEBUG   : Build fingerprint: 'google/redfin/redfin:14/UP1A.231105.001.B2/11260668:user/release-keys'
06-18 16:11:47.549 31118 31118 F DEBUG   : Revision: 'MP1.0'
06-18 16:11:47.549 31118 31118 F DEBUG   : ABI: 'arm64'
06-18 16:11:47.549 31118 31118 F DEBUG   : Timestamp: 2024-06-18 16:11:46.651759126+0530
06-18 16:11:47.549 31118 31118 F DEBUG   : Process uptime: 686s
06-18 16:11:47.549 31118 31118 F DEBUG   : Cmdline: io.ente.photos.independent
06-18 16:11:47.549 31118 31118 F DEBUG   : pid: 30309, tid: 30624, name: 2.ui  >>> io.ente.photos.independent <<<
06-18 16:11:47.549 31118 31118 F DEBUG   : uid: 10404
06-18 16:11:47.549 31118 31118 F DEBUG   : signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x0000000000000000
06-18 16:11:47.549 31118 31118 F DEBUG   : Cause: null pointer dereference
06-18 16:11:47.549 31118 31118 F DEBUG   :     x0  0000000000000000  x1  0000000000000051  x2  b400007a6e60b410  x3  0000000000000010
06-18 16:11:47.549 31118 31118 F DEBUG   :     x4  0000000000000000  x5  0000000000000000  x6  0000000000000480  x7  0000000000000640
06-18 16:11:47.549 31118 31118 F DEBUG   :     x8  0000000000000000  x9  b2d3c020e746ad14  x10 0000000000000003  x11 000000007e6f1618
06-18 16:11:47.549 31118 31118 F DEBUG   :     x12 b400007a1e648978  x13 b400007a1e648950  x14 b4000077b1652fc0  x15 0000000000000000
06-18 16:11:47.549 31118 31118 F DEBUG   :     x16 0000000000000001  x17 0000007c440eb518  x18 00000077c5368000  x19 0000007908c66148
06-18 16:11:47.549 31118 31118 F DEBUG   :     x20 0000007908c662a0  x21 000000000000003c  x22 b400007a7e650190  x23 0000007908c660b0
06-18 16:11:47.549 31118 31118 F DEBUG   :     x24 0000000000000000  x25 0000000000000000  x26 0000007908c67c00  x27 000000000000012c
06-18 16:11:47.549 31118 31118 F DEBUG   :     x28 b400007a2e613388  x29 0000007908c65f30
06-18 16:11:47.549 31118 31118 F DEBUG   :     lr  00000077ca3ff09c  sp  0000007908c65ea0  pc  00000077ca3ff070  pst 0000000080000000
06-18 16:11:47.549 31118 31118 F DEBUG   : 30 total frames
06-18 16:11:47.549 31118 31118 F DEBUG   : backtrace:
06-18 16:11:47.549 31118 31118 F DEBUG   :       #00 pc 00000000009fa070  /data/app/~~5o-AHNffC8fp5tg9QROXfQ==/io.ente.photos.independent-_hK-rsqz-LT1fhX16peg4A==/lib/arm64/libonnxruntime.so (BuildId: 2e821a251292a43fb57cb005cf4be6686c138da8)
06-18 16:11:47.549 31118 31118 F DEBUG   :       #01 pc 00000000009e5bdc  /data/app/~~5o-AHNffC8fp5tg9QROXfQ==/io.ente.photos.independent-_hK-rsqz-LT1fhX16peg4A==/lib/arm64/libonnxruntime.so (BuildId: 2e821a251292a43fb57cb005cf4be6686c138da8)
06-18 16:11:47.549 31118 31118 F DEBUG   :       #02 pc 00000000009e5468  /data/app/~~5o-AHNffC8fp5tg9QROXfQ==/io.ente.photos.independent-_hK-rsqz-LT1fhX16peg4A==/lib/arm64/libonnxruntime.so (BuildId: 2e821a251292a43fb57cb005cf4be6686c138da8)
06-18 16:11:47.549 31118 31118 F DEBUG   :       #03 pc 0000000000a0dc10  /data/app/~~5o-AHNffC8fp5tg9QROXfQ==/io.ente.photos.independent-_hK-rsqz-LT1fhX16peg4A==/lib/arm64/libonnxruntime.so (BuildId: 2e821a251292a43fb57cb005cf4be6686c138da8)
06-18 16:11:47.549 31118 31118 F DEBUG   :       #04 pc 0000000000a0d78c  /data/app/~~5o-AHNffC8fp5tg9QROXfQ==/io.ente.photos.independent-_hK-rsqz-LT1fhX16peg4A==/lib/arm64/libonnxruntime.so (BuildId: 2e821a251292a43fb57cb005cf4be6686c138da8)
06-18 16:11:47.549 31118 31118 F DEBUG   :       #05 pc 0000000000a0ee5c  /data/app/~~5o-AHNffC8fp5tg9QROXfQ==/io.ente.photos.independent-_hK-rsqz-LT1fhX16peg4A==/lib/arm64/libonnxruntime.so (BuildId: 2e821a251292a43fb57cb005cf4be6686c138da8)
06-18 16:11:47.549 31118 31118 F DEBUG   :       #06 pc 00000000003c91c8  /data/app/~~5o-AHNffC8fp5tg9QROXfQ==/io.ente.photos.independent-_hK-rsqz-LT1fhX16peg4A==/lib/arm64/libonnxruntime.so (BuildId: 2e821a251292a43fb57cb005cf4be6686c138da8)
06-18 16:11:47.549 31118 31118 F DEBUG   :       #07 pc 000000000039e8b4  /data/app/~~5o-AHNffC8fp5tg9QROXfQ==/io.ente.photos.independent-_hK-rsqz-LT1fhX16peg4A==/lib/arm64/libonnxruntime.so (BuildId: 2e821a251292a43fb57cb005cf4be6686c138da8)
06-18 16:11:47.549 31118 31118 F DEBUG   :       #08 pc 0000000000b21694  /data/app/~~5o-AHNffC8fp5tg9QROXfQ==/io.ente.photos.independent-_hK-rsqz-LT1fhX16peg4A==/lib/arm64/libapp.so (BuildId: 6fc4a6dcc1628c6905ec2b43ba89d91c)
06-18 16:11:47.549 31118 31118 F DEBUG   :       #09 pc 0000000000d343f0  /data/app/~~5o-AHNffC8fp5tg9QROXfQ==/io.ente.photos.independent-_hK-rsqz-LT1fhX16peg4A==/lib/arm64/libapp.so (BuildId: 6fc4a6dcc1628c6905ec2b43ba89d91c)
06-18 16:11:47.549 31118 31118 F DEBUG   :       #10 pc 0000000000d33d04  /data/app/~~5o-AHNffC8fp5tg9QROXfQ==/io.ente.photos.independent-_hK-rsqz-LT1fhX16peg4A==/lib/arm64/libapp.so (BuildId: 6fc4a6dcc1628c6905ec2b43ba89d91c)
06-18 16:11:47.549 31118 31118 F DEBUG   :       #11 pc 000000000117d7d0  /data/app/~~5o-AHNffC8fp5tg9QROXfQ==/io.ente.photos.independent-_hK-rsqz-LT1fhX16peg4A==/lib/arm64/libapp.so (BuildId: 6fc4a6dcc1628c6905ec2b43ba89d91c)
06-18 16:11:47.549 31118 31118 F DEBUG   :       #12 pc 000000000119bf08  /data/app/~~5o-AHNffC8fp5tg9QROXfQ==/io.ente.photos.independent-_hK-rsqz-LT1fhX16peg4A==/lib/arm64/libapp.so (BuildId: 6fc4a6dcc1628c6905ec2b43ba89d91c)
06-18 16:11:47.549 31118 31118 F DEBUG   :       #13 pc 00000000016a6460  /data/app/~~5o-AHNffC8fp5tg9QROXfQ==/io.ente.photos.independent-_hK-rsqz-LT1fhX16peg4A==/lib/arm64/libapp.so (BuildId: 6fc4a6dcc1628c6905ec2b43ba89d91c)
06-18 16:11:47.549 31118 31118 F DEBUG   :       #14 pc 0000000000b33850  /data/app/~~5o-AHNffC8fp5tg9QROXfQ==/io.ente.photos.independent-_hK-rsqz-LT1fhX16peg4A==/lib/arm64/libapp.so (BuildId: 6fc4a6dcc1628c6905ec2b43ba89d91c)
06-18 16:11:47.549 31118 31118 F DEBUG   :       #15 pc 0000000000b33748  /data/app/~~5o-AHNffC8fp5tg9QROXfQ==/io.ente.photos.independent-_hK-rsqz-LT1fhX16peg4A==/lib/arm64/libapp.so (BuildId: 6fc4a6dcc1628c6905ec2b43ba89d91c)
06-18 16:11:47.550 31118 31118 F DEBUG   :       #16 pc 0000000000b3370c  /data/app/~~5o-AHNffC8fp5tg9QROXfQ==/io.ente.photos.independent-_hK-rsqz-LT1fhX16peg4A==/lib/arm64/libapp.so (BuildId: 6fc4a6dcc1628c6905ec2b43ba89d91c)
06-18 16:11:47.550 31118 31118 F DEBUG   :       #17 pc 0000000000b23f80  /data/app/~~5o-AHNffC8fp5tg9QROXfQ==/io.ente.photos.independent-_hK-rsqz-LT1fhX16peg4A==/lib/arm64/libapp.so (BuildId: 6fc4a6dcc1628c6905ec2b43ba89d91c)
06-18 16:11:47.550 31118 31118 F DEBUG   :       #18 pc 0000000000c3f034  /data/app/~~5o-AHNffC8fp5tg9QROXfQ==/io.ente.photos.independent-_hK-rsqz-LT1fhX16peg4A==/lib/arm64/libflutter.so (BuildId: 1445521dbe2e00121e10e2ebe6a1a8f1b78cf532)
06-18 16:11:47.550 31118 31118 F DEBUG   :       #19 pc 0000000000df04fc  /data/app/~~5o-AHNffC8fp5tg9QROXfQ==/io.ente.photos.independent-_hK-rsqz-LT1fhX16peg4A==/lib/arm64/libflutter.so (BuildId: 1445521dbe2e00121e10e2ebe6a1a8f1b78cf532)
06-18 16:11:47.550 31118 31118 F DEBUG   :       #20 pc 0000000000ba81fc  /data/app/~~5o-AHNffC8fp5tg9QROXfQ==/io.ente.photos.independent-_hK-rsqz-LT1fhX16peg4A==/lib/arm64/libflutter.so (BuildId: 1445521dbe2e00121e10e2ebe6a1a8f1b78cf532)
06-18 16:11:47.550 31118 31118 F DEBUG   :       #21 pc 0000000000861964  /data/app/~~5o-AHNffC8fp5tg9QROXfQ==/io.ente.photos.independent-_hK-rsqz-LT1fhX16peg4A==/lib/arm64/libflutter.so (BuildId: 1445521dbe2e00121e10e2ebe6a1a8f1b78cf532)
06-18 16:11:47.550 31118 31118 F DEBUG   :       #22 pc 00000000008654b8  /data/app/~~5o-AHNffC8fp5tg9QROXfQ==/io.ente.photos.independent-_hK-rsqz-LT1fhX16peg4A==/lib/arm64/libflutter.so (BuildId: 1445521dbe2e00121e10e2ebe6a1a8f1b78cf532)
06-18 16:11:47.550 31118 31118 F DEBUG   :       #23 pc 000000000000f63c  /system/lib64/libutils.so (android::Looper::pollOnce(int, int*, int*, void**)+856) (BuildId: 30fb9ccffaff83282118eb2597dd4631)
06-18 16:11:47.550 31118 31118 F DEBUG   :       #24 pc 0000000000019de0  /system/lib64/libandroid.so (ALooper_pollOnce+100) (BuildId: afd7c304b01296ae1a8e345f8e27fcc1)
06-18 16:11:47.550 31118 31118 F DEBUG   :       #25 pc 00000000008655c4  /data/app/~~5o-AHNffC8fp5tg9QROXfQ==/io.ente.photos.independent-_hK-rsqz-LT1fhX16peg4A==/lib/arm64/libflutter.so (BuildId: 1445521dbe2e00121e10e2ebe6a1a8f1b78cf532)
06-18 16:11:47.550 31118 31118 F DEBUG   :       #26 pc 0000000000863674  /data/app/~~5o-AHNffC8fp5tg9QROXfQ==/io.ente.photos.independent-_hK-rsqz-LT1fhX16peg4A==/lib/arm64/libflutter.so (BuildId: 1445521dbe2e00121e10e2ebe6a1a8f1b78cf532)
06-18 16:11:47.550 31118 31118 F DEBUG   :       #27 pc 0000000000863460  /data/app/~~5o-AHNffC8fp5tg9QROXfQ==/io.ente.photos.independent-_hK-rsqz-LT1fhX16peg4A==/lib/arm64/libflutter.so (BuildId: 1445521dbe2e00121e10e2ebe6a1a8f1b78cf532)
06-18 16:11:47.550 31118 31118 F DEBUG   :       #28 pc 00000000000bf1f4  /apex/com.android.runtime/lib64/bionic/libc.so (__pthread_start(void*)+208) (BuildId: 011e1f176d34c907f9e683504c06b67c)
06-18 16:11:47.550 31118 31118 F DEBUG   :       #29 pc 000000000005d984  /apex/com.android.runtime/lib64/bionic/libc.so (__start_thread+64) (BuildId: 011e1f176d34c907f9e683504c06b67c)

To reproduce

The issue is reproducible by letting the app continuously run inference. Since this is happening in the app, it's a bit hard to give a clear and easy MRE. If nothing comes up from the error logs alone I'll try to create a dummy app that reproduces the issue and share the code for it here.

Urgency

I don't know how urgent this issue is to ORT, but for our app it's quite urgent.

Platform

Android

OS Version

Android 14 (and other versions)

ONNX Runtime Installation

Built from Source

Compiler Version (if 'Built from Source')

No response

Package Name (if 'Released Package')

None

ONNX Runtime Version or Commit ID

v1.15.0

ONNX Runtime API

C++/C

Architecture

ARM64

Execution Provider

Default CPU

Execution Provider Library Version

No response

The text was updated successfully, but these errors were encountered:

skottmckay · 2024-06-20T23:50:42Z

Hard to say without a stack trace. with symbol names

ORT will do most allocations during model initialization and the first inference. After that it's using a cache for memory so segfaults would typically be an out-of-memory scenario or bad input (e.g. input tensor is freed while ORT is using it).

If you're building from source can you build a debug version? May need to ensure the Android build doesn't strip the binary of symbols though as typically it.

Does the issue happen if you run on the Android emulator? Would be easier to debug if it did.

Another option would be to copy onnxruntime_perf_test using adb to the phone (use /data/local/tmp), along with the model, and run. you can specify the number of iterations or amount of time to run for, and it can generate dummy input data.

laurenspriem · 2024-06-21T18:16:37Z

Hi @skottmckay thanks for your response.

I have created an MRE in the form of a demo app that has the bug. Please check out this repo. The bug is reproducible on Android emulator, it will crash anywhere in the range of 100-1000 inference runs, which should only take a few minutes to reach. Does this help in debugging?

I would like to provide a stack trace of the crash also, but I don't know how to get that on the native layer. Any pointers you can give me for that? In any case, I appreciate the help :)

Windsander · 2024-06-23T07:54:47Z

this issue same with :
#21097

which I solved by including generated header files.
In my case, it's caused by function mapping.
maybe you can try. Hope it helps. 0x0

skottmckay · 2024-06-24T00:13:16Z

@laurenspriem is it reproducible by running onnxruntime_perf_test in a shell on the emulator? If so that would rule out the issue being in the flutter plugin you're using (which we don't own).

Use adb push <file> /data/local/tmp to copy onnxruntime_perf_test and your model to /data/local/tmp. Using adb shell, chmod +x /data/local/tmp/onnxruntime_perf_test to make it executable. cd /data/local/tmp. ./onnxruntime_perf_test -I -r 2000 <model.onnx> will run the model 1000 times, generating random input that matches the model inputs. If that does not crash, most likely the issue is with the flutter plugin.

May be possible to get symbols using ndk-stack: https://developer.android.com/ndk/guides/ndk-stack.html

laurenspriem · 2024-06-25T15:52:21Z

I am trying to run onnxruntime_perf_test in the emulator as you suggested. However, it stops and gives me the following text back:

/onnxruntime/onnxruntime/test/onnx/TestCase.cc:705 OnnxTestCase::OnnxTestCase(const std::string &, std::unique_ptr<TestModelInfo>, double, double) test case dir doesn't exist

Any clue what is going wrong?

skottmckay · 2024-07-05T01:20:57Z

Are you running with -I so it generates input data?

Otherwise you need to create a test case directory with input data in serialized protobuf files which is the same input format as onnx_test_runner requires.

laurenspriem · 2024-08-06T08:49:05Z

Thanks for the help! In the end the issue indeed seemed to be around the package we were using and not ONNX Runtime. We have since switched to using ONNX Runtime for mobile directly, through Flutter Platform Channels, which has resolved the issue.

laurenspriem added the platform:mobile issues related to ONNX Runtime mobile; typically submitted using template label Jun 18, 2024

laurenspriem closed this as completed Aug 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Mobile] Segmentation fault after repeated inference #21082

[Mobile] Segmentation fault after repeated inference #21082

laurenspriem commented Jun 18, 2024

skottmckay commented Jun 20, 2024

laurenspriem commented Jun 21, 2024

Windsander commented Jun 23, 2024 •

edited

Loading

skottmckay commented Jun 24, 2024

laurenspriem commented Jun 25, 2024

skottmckay commented Jul 5, 2024

laurenspriem commented Aug 6, 2024

[Mobile] Segmentation fault after repeated inference #21082

[Mobile] Segmentation fault after repeated inference #21082

Comments

laurenspriem commented Jun 18, 2024

Describe the issue

Error logs

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

Compiler Version (if 'Built from Source')

Package Name (if 'Released Package')

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

skottmckay commented Jun 20, 2024

laurenspriem commented Jun 21, 2024

Windsander commented Jun 23, 2024 • edited Loading

skottmckay commented Jun 24, 2024

laurenspriem commented Jun 25, 2024

skottmckay commented Jul 5, 2024

laurenspriem commented Aug 6, 2024

Windsander commented Jun 23, 2024 •

edited

Loading