-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ONNXRuntime Segmentation Fault Crash on Inference (iOS and Mac) #18632
Comments
This is not generally expected - we have automated tests that do this. Could you please provide some example code and a model to reproduce the issue? And to confirm, you've observed the issue on both MacOS and iOS? |
Thanks a lot for such a quick reply. Indeed I'm experiencing this issue on both iOS and Mac. The input tensors are constructed using ORTValue class (source code in ort_value.mm) (there are three input objects). Then they are then fed to ORTSession's -runWithInputs:outputNames:runOptions:error: The three input tensors are created from NSData which would be equivalent to the following PyTorch tensors (just constructed using the iOS API with NSData): torch.tensor([[2, 23, 23, 23, 15, 15, 15, 22, 22, 22, 21, 21, 21, 12, 12, 12, 20, 20, So with those three tensors I feed it to the model using the Onnxruntime. Now viewing model in Netron I can see it has three input arguments (each of the tensors above get passed in). Neutron shows: name: arg0 name: arg1 name: arg2 --
I can send a model if it is needed. Again It doesn't crash always. Sometimes I get an output tensor back but on most app runs I get the segmentation fault and a crash. Input data is always exactly the same in my current testing (those three tensors from above). Thanks again for taking a look into this. |
To add I haven't experienced any issues when running the input through the Python API: result = ort_session.run(output_names=['outputName'], Seems to work every time from Python. I tried building the onnxruntime from source (Mac version) and using the debug version but in the debug build I can't even get the .onnx model to load (so I can't run inference on it). Not sure if I did something wrong when compiling from source. Is there a way I can disable the logger? I can play with the logging level it seems but is there a way to turn logging off? Thinking the logger might be using a dangling pointer. |
If the same ONNX model is consistently working with the Python API, perhaps the model does not have an issue. Can you share your code that calls the ORT Objective-C API? |
This should be possible. If you provide more details about the error, we can look into it.
I don't think logging can be disabled completely without code changes. Setting the least verbose logging level should effectively turn it off. |
Sure. The tensors are created from arrays of NSNumbers into NSData, then fed to the ORTValue class:
Then they are put in a NSDictionary and fed to the model:
Now the array of numbers that are used to make the tensors are the same as the py torch values I initially posted. Sometimes it works but most of the times it crashes. I haven't bothered trying to change the input until I can figure out why it's crashing intermittently. I tried a few workaround (like adding extra strong references to the ORTValue objects until after the -runWithInput:outNames:runOptions: method returns. I also tried passing copies of my NSData objects to the tensors when creating them. Neither worked. |
I installed the source code and ran the build script on my Mac which generated a onnxruntime.xcodeproj with lots of Targets. I then added the onnxruntime.xcodeproj as a subproject to a sample Xcode project of mine to test my model against the source code (so hopefully I could just set a breakpoint in the function that is crashing to get a clearer view). It built a couple times (but couldn't get my model to load, I kept getting error code 6 which doesn't happen in the release version). I was going to try again but now the onnxruntime.xcodeproject keep crashing the Xcode build system. I don't know enough about the CMake Build system to quickly sort through this but if there was a guide of how to get an Xcode project configured to build from the source code I'd be able to run against the source code directly and provide better info (since the CocoaPod ships a prebuilt xcframework I can't step into it and reason about the crashing code). |
Thanks for providing some of the code. Was also curious about how the ORTEnv and ORTSession are created and in what scope. It would be helpful if you could provide a complete (and hopefully minimal) example program that reproduces the issue. The prebuilt xcframework is a release build. IIRC, creating a pod with a debug build runs into some CocoaPods size limit issue. You could try building a debug version of onnxruntime.xcframework. There's a helper script: You can pass And this is the build settings file used for onnxruntime-c: |
Could you also provide the model you used? With that, we can better attempt to reproduce it on our end. FYI, we've had another report of a crash with a similar call stack. |
One thing to verify - be sure that the ORTEnv has not been destroyed at the point where ORTSession run is called. An ORTEnv's lifetime needs to eclipse that of any ORTSession initialized with it. We can improve the API by having ORTSession keep a strong pointer to the ORTEnv. |
I ended up dropping down to the C API because it seemed like it could be related to something in the C++ / Objc wrappers and I haven't ran into any issues from C. This was a bit tedious since I have to manually manage memory for everything in C but everything is working.
That appears to be what was going on 🤦♂️. After adding a strong property that holds the ORTEnv I haven't ran into the crash (though I haven't tested extensively, but that seems to be the cause of the crash). I definitely think this should be improved and that the ORTSession should take ownership of the ORTEnv (considering the fact that if the ORTSession outlives the ORTEnv it causes a crash). By convention the public initializer -initWithEnv:modelPath:sessionOptions:error: implicitly implies that ORTSession does take ownership of the ORTEnv (in the same way that an external strong reference to the path argument wouldn't be expected to be required). I didn't even think to hold an extra strong reference to the ORTEnv externally from my code! IMO if the ORTEnv needs to be accessed externally from outside the ORTSession after creation it should be exposed as a readonly property on ORTSession (though after the ORTEnv is created I don't see anything particularly interesting in the public header that would cause a client to want or need to access it after the session is created so maybe it should just be private). Now I feel kind of silly spending all that time writing C code when all I had to do was add a strong reference on the ORTEnv! Now I have to decide whether to use my C code or the provided ObjC++ wrapper. Thanks a lot for taking the time to respond. Very helpful info and is appreciated. |
Glad you got it working now. The documentation of ORTEnv is a bit lacking, but it is in fact required to be kept around while calling other ORT APIs. This is also true for OrtEnv in the C/C++ API, which is what the Objective-C API is implemented with. I have a PR which hopefully improves this: #18738 Thanks for reporting the issue. |
I don't think this was completely solved with #18738. From this investigation, this looks like a concurrency issue. |
Describe the issue
I'm using ONNXRuntime (the .xcframework from the Cocoapod on iOS). I also tested using the released Mac .dylib and am running into the same issue.
The model takes dynamic shape input (although in my testing I'm feeding the model the same input over and over again). So I'm feeding the ONNXRuntime API the same input always right now (the exact same tensors). On some launches it works sometimes and other times it crashes as soon as I run an ORTSession. Can't figure out why.
Here's the crash:
To reproduce
Additional Info: My model supports dynamic input shape but I'm feeding it the same tensors every time in my testing currently. On some app launches it works. On other app launches it crashes when I run on the ORTSession.
Thanks for taking the time to look at this.
Urgency
No response
Platform
Mac
OS Version
Sonoma 14.1
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
1.16.2
ONNX Runtime API
C++
Architecture
ARM64
Execution Provider
Default CPU
Execution Provider Library Version
No response
The text was updated successfully, but these errors were encountered: