Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ONNXRuntime Segmentation Fault Crash on Inference (iOS and Mac) #18632

Open
SeymourNickelson opened this issue Nov 29, 2023 · 13 comments
Open

ONNXRuntime Segmentation Fault Crash on Inference (iOS and Mac) #18632

SeymourNickelson opened this issue Nov 29, 2023 · 13 comments
Labels
platform:mobile issues related to ONNX Runtime mobile; typically submitted using template

Comments

@SeymourNickelson
Copy link

SeymourNickelson commented Nov 29, 2023

Describe the issue

I'm using ONNXRuntime (the .xcframework from the Cocoapod on iOS). I also tested using the released Mac .dylib and am running into the same issue.

The model takes dynamic shape input (although in my testing I'm feeding the model the same input over and over again). So I'm feeding the ONNXRuntime API the same input always right now (the exact same tensors). On some launches it works sometimes and other times it crashes as soon as I run an ORTSession. Can't figure out why.

Here's the crash:

Exception Type:  EXC_BAD_ACCESS (SIGSEGV)
Exception Subtype: KERN_INVALID_ADDRESS at 0x0000000680080001
Exception Codes: 0x0000000000000001, 0x0000000680080001
VM Region Info: 0x680080001 is not in any region.  Bytes after previous region: 16643522562  Bytes before following region: 39727923199
      REGION TYPE                 START - END      [ VSIZE] PRT/MAX SHRMOD  REGION DETAIL
      MALLOC_NANO              280000000-2a0000000 [512.0M] rw-/rwx SM=COW  
--->  GAP OF 0xd20000000 BYTES
      commpage (reserved)      fc0000000-1000000000 [  1.0G] ---/--- SM=NUL  ...(unallocated)
Termination Reason: SIGNAL 11 Segmentation fault: 11
Terminating Process: exc handler [11560]

Triggered by Thread:  0

Thread 0 name:   Dispatch queue: com.apple.main-thread
Thread 0 Crashed:
0   ONNXRuntime              	       0x1056cbf90 onnxruntime::logging::LoggingManager::Log(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, onnxruntime::logging::Capture const&) const + 132
1   ONNXRuntime              	       0x1056cb630 onnxruntime::logging::Capture::~Capture() + 40
2   ONNXRuntime              	       0x105845648 onnxruntime::ExecuteKernel(onnxruntime::StreamExecutionContext&, unsigned long, unsigned long, bool const&, onnxruntime::SessionScope&) + 888
3   ONNXRuntime              	       0x10581a5d8 onnxruntime::LaunchKernelStep::Execute(onnxruntime::StreamExecutionContext&, unsigned long, onnxruntime::SessionScope&, bool const&, bool&) + 48
4   ONNXRuntime              	       0x1058735b4 onnxruntime::RunSince(unsigned long, onnxruntime::StreamExecutionContext&, onnxruntime::SessionScope&, bool const&, unsigned long) + 220
5   ONNXRuntime              	       0x105845f6c onnxruntime::ExecuteThePlan(onnxruntime::SessionState const&, gsl::span<int const, 18446744073709551615ul>, gsl::span<OrtValue const, 18446744073709551615ul>, gsl::span<int const, 18446744073709551615ul>, std::__1::vector<OrtValue, std::__1::allocator<OrtValue>>&, std::__1::unordered_map<unsigned long, std::__1::function<onnxruntime::common::Status (onnxruntime::TensorShape const&, OrtDevice const&, OrtValue&, bool&)>, std::__1::hash<unsigned long>, std::__1::equal_to<unsigned long>, std::__1::allocator<std::__1::pair<unsigned long const, std::__1::function<onnxruntime::common::Status (onnxruntime::TensorShape const&, OrtDevice const&, OrtValue&, bool&)>>>> const&, onnxruntime::logging::Logger const&, onnxruntime::DeviceStreamCollection const*, bool const&, bool, bool) + 796
6   ONNXRuntime              	       0x10588dfc4 onnxruntime::utils::ExecuteGraphImpl(onnxruntime::SessionState const&, onnxruntime::FeedsFetchesManager const&, gsl::span<OrtValue const, 18446744073709551615ul>, std::__1::vector<OrtValue, std::__1::allocator<OrtValue>>&, std::__1::unordered_map<unsigned long, std::__1::function<onnxruntime::common::Status (onnxruntime::TensorShape const&, OrtDevice const&, OrtValue&, bool&)>, std::__1::hash<unsigned long>, std::__1::equal_to<unsigned long>, std::__1::allocator<std::__1::pair<unsigned long const, std::__1::function<onnxruntime::common::Status (onnxruntime::TensorShape const&, OrtDevice const&, OrtValue&, bool&)>>>> const&, ExecutionMode, bool const&, onnxruntime::logging::Logger const&, onnxruntime::DeviceStreamCollection*, bool, onnxruntime::Stream*) + 940
7   ONNXRuntime              	       0x10588da88 onnxruntime::utils::ExecuteGraph(onnxruntime::SessionState const&, onnxruntime::FeedsFetchesManager&, gsl::span<OrtValue const, 18446744073709551615ul>, std::__1::vector<OrtValue, std::__1::allocator<OrtValue>>&, ExecutionMode, bool const&, onnxruntime::logging::Logger const&, onnxruntime::DeviceStreamCollectionHolder&, bool, onnxruntime::Stream*) + 748
8   ONNXRuntime              	       0x10588f010 onnxruntime::utils::ExecuteGraph(onnxruntime::SessionState const&, onnxruntime::FeedsFetchesManager&, gsl::span<OrtValue const, 18446744073709551615ul>, std::__1::vector<OrtValue, std::__1::allocator<OrtValue>>&, ExecutionMode, OrtRunOptions const&, onnxruntime::DeviceStreamCollectionHolder&, onnxruntime::logging::Logger const&) + 44
9   ONNXRuntime              	       0x105eca6e0 onnxruntime::InferenceSession::Run(OrtRunOptions const&, gsl::span<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const, 18446744073709551615ul>, gsl::span<OrtValue const, 18446744073709551615ul>, gsl::span<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const, 18446744073709551615ul>, std::__1::vector<OrtValue, std::__1::allocator<OrtValue>>*, std::__1::vector<OrtDevice, std::__1::allocator<OrtDevice>> const*) + 3304
10  ONNXRuntime              	       0x105ecbf0c onnxruntime::InferenceSession::Run(OrtRunOptions const&, gsl::span<char const* const, 18446744073709551615ul>, gsl::span<OrtValue const* const, 18446744073709551615ul>, gsl::span<char const* const, 18446744073709551615ul>, gsl::span<OrtValue*, 18446744073709551615ul>) + 988
11  ONNXRuntime              	       0x105ee9a80 OrtApis::Run(OrtSession*, OrtRunOptions const*, char const* const*, OrtValue const* const*, unsigned long, char const* const*, unsigned long, OrtValue**) + 104
12  ONNXRuntime              	       0x105500420 -[ORTSession runWithInputs:outputNames:runOptions:error:] + 1764

To reproduce

  1. Run a ORTSession using the ONNXRuntime on iOS.
  2. Crash (sometimes)

Additional Info: My model supports dynamic input shape but I'm feeding it the same tensors every time in my testing currently. On some app launches it works. On other app launches it crashes when I run on the ORTSession.

Thanks for taking the time to look at this.

Urgency

No response

Platform

Mac

OS Version

Sonoma 14.1

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.16.2

ONNX Runtime API

C++

Architecture

ARM64

Execution Provider

Default CPU

Execution Provider Library Version

No response

@github-actions github-actions bot added the platform:mobile issues related to ONNX Runtime mobile; typically submitted using template label Nov 29, 2023
@edgchen1
Copy link
Contributor

  1. Run a ORTSession using the ONNXRuntime on iOS.
  2. Crash (sometimes)

This is not generally expected - we have automated tests that do this.

Could you please provide some example code and a model to reproduce the issue?

And to confirm, you've observed the issue on both MacOS and iOS?

@SeymourNickelson
Copy link
Author

Thanks a lot for such a quick reply. Indeed I'm experiencing this issue on both iOS and Mac.

The input tensors are constructed using ORTValue class (source code in ort_value.mm) (there are three input objects). Then they are then fed to ORTSession's -runWithInputs:outputNames:runOptions:error:

The three input tensors are created from NSData which would be equivalent to the following PyTorch tensors (just constructed using the iOS API with NSData):

torch.tensor([[2, 23, 23, 23, 15, 15, 15, 22, 22, 22, 21, 21, 21, 12, 12, 12, 20, 20,
20, 16, 16, 16, 33, 33, 33, 16, 16, 16, 21, 21, 21, 14, 14, 14, 6, 0,
0, 0],
[2, 16, 16, 16, 20, 20, 20, 23, 23, 23, 22, 22, 22, 26, 26, 26, 16, 16,
16, 20, 20, 20, 23, 23, 23, 8, 8, 8, 9, 9, 9, 19, 19, 19, 12, 12,
12, 6]])
torch.tensor([35, 38])
torch.tensor([2, 2])

So with those three tensors I feed it to the model using the Onnxruntime. Now viewing model in Netron I can see it has three input arguments (each of the tensors above get passed in). Neutron shows:

name: arg0
tensor: int64[arg0_dim_0,arg0_dim_1]

name: arg1
tensor: int64[arg1_dim_0]

name: arg2
tensor: int64[arg2_dim_0]

--
I briefly searched the onnx source code (I'm not compiling from source as I'm using the prebuilt xcframework from the Cocoapod) but the crash report seems to indicate that it's crashing in here (when logging maybe):

onnxruntime::Status ExecuteKernel(StreamExecutionContext& ctx,
                                  NodeIndex idx,
                                  size_t stream_idx,
                                  const bool& terminate_flag,
                                  SessionScope& session_scope) {
  auto* p_kernel = ctx.GetSessionState().GetKernel(idx);
  if (p_kernel->KernelDef().OpName() == "YieldOp") {
    // Do not execute YieldOp (it is an no-op anyways).
    // Decrement the reference count of tensors that are not needed beyond this point.
    // REVEIW(codemzs): The current model assumes the intermediate tensors that are exported
    // as graph outputs are owned by ORT, the risk of caller freeing the tensor or manipulating tensor
    // memory lingers while the tensor is used downstream after the export.
    ctx.RecycleNodeInputs(idx);
    return Status::OK();
  }
  // TODO: set terminate flag from run_option
  OpKernelContextInternal kernel_ctx(ctx.GetSessionState(),
                                     ctx.GetExecutionFrame(),
                                     *p_kernel,
                                     ctx.GetLogger(),
                                     terminate_flag,
                                     ctx.GetDeviceStream(stream_idx));
  onnxruntime::Status status;
  auto& logger = ctx.GetLogger();
  if (p_kernel->IsAsync()) {
    ORT_THROW("Async Kernel Support is not implemented yet.");
  } else {
    KernelScope kernel_scope(session_scope, kernel_ctx, *p_kernel);
    ORT_TRY {
#ifdef ENABLE_TRAINING
      // AllocateInputsContiguously - is only required for NCCL kernels
      // can be moved under USE_NCCL
      if (p_kernel->KernelDef().AllocateInputsContiguously()) {
        ORT_RETURN_IF_ERROR(utils::VerifyInputTensorsAllocatedContiguously(&kernel_ctx));
      }

      // This is most probably deprecated code and is causing unnecessary complexity.
      // Can be removed.
      // Cache lookup. Currently we only cache single-output nodes,
      // to keep memory overhead impact in check. Hence we only look in cache
      // if the current node has one output.
      bool reuse_cached_value = false;
      std::string cached_arg_name;
      auto& cache = ctx.GetOrtValueCache();
      if (cache != nullptr) {
        if (p_kernel->Node().OutputDefs().size() == 1) {
          cached_arg_name = p_kernel->Node().OutputDefs()[0]->Name();
          if (cache.get()->count(cached_arg_name)) {  // found arg in cache_
            VLOGS(logger, 1) << "Found OrtValue in cache for arg: " << cached_arg_name;
            reuse_cached_value = true;
          }
        }
      }
      if (!reuse_cached_value) {
        status = p_kernel->Compute(&kernel_ctx);
      } else {
        status = kernel_ctx.SetOutputMLValue(0, cache.get()->at(cached_arg_name));
      }
#else
      status = p_kernel->Compute(&kernel_ctx);
#endif
    }
    ORT_CATCH(const std::exception& ex) {
      ORT_HANDLE_EXCEPTION([&]() {
        status = ORT_MAKE_STATUS(ONNXRUNTIME, RUNTIME_EXCEPTION, ex.what());
      });
    }
  }
  if (!status.IsOK()) {
    std::ostringstream ss;
    const auto& node = p_kernel->Node();
    ss << "Non-zero status code returned while running " << node.OpType() << " node. Name:'" << node.Name()
       << "' Status Message: " << status.ErrorMessage();
    // If the computation failed, we still can record the memory consumption
#if !defined(ORT_MINIMAL_BUILD) && defined(ORT_MEMORY_PROFILE)
    ctx.GetSessionState().GetMemoryProfiler()->CreateEvents(
        "dynamic activations_" + std::to_string(ctx.GetSessionState().GetMemoryProfiler()->GetMemoryInfo().GetIteration()),
        ctx.GetSessionState().GetMemoryProfiler()->GetAndIncreasePid(), MemoryInfo::MapType::DynamicActivation, "", 0);
#endif
    const auto msg_string = ss.str();
    LOGS(logger, ERROR) << msg_string;
    return Status(status.Category(), status.Code(), msg_string);
  }
  ctx.RecycleNodeInputs(idx);
  LOGS(logger, VERBOSE) << "stream " << stream_idx << " launch kernel with idx " << idx;
  return Status::OK();
}

I can send a model if it is needed. Again It doesn't crash always. Sometimes I get an output tensor back but on most app runs I get the segmentation fault and a crash. Input data is always exactly the same in my current testing (those three tensors from above). Thanks again for taking a look into this.

@SeymourNickelson
Copy link
Author

To add I haven't experienced any issues when running the input through the Python API:

result = ort_session.run(output_names=['outputName'],
input_feed={"arg0":tensorOne,"arg1":tensorTwo,"arg2":tensorThree})

Seems to work every time from Python.

I tried building the onnxruntime from source (Mac version) and using the debug version but in the debug build I can't even get the .onnx model to load (so I can't run inference on it). Not sure if I did something wrong when compiling from source.

Is there a way I can disable the logger? I can play with the logging level it seems but is there a way to turn logging off? Thinking the logger might be using a dangling pointer.

@edgchen1
Copy link
Contributor

If the same ONNX model is consistently working with the Python API, perhaps the model does not have an issue.

Can you share your code that calls the ORT Objective-C API?

@edgchen1
Copy link
Contributor

I tried building the onnxruntime from source (Mac version) and using the debug version but in the debug build I can't even get the .onnx model to load (so I can't run inference on it). Not sure if I did something wrong when compiling from source.

This should be possible. If you provide more details about the error, we can look into it.

Is there a way I can disable the logger? I can play with the logging level it seems but is there a way to turn logging off?

I don't think logging can be disabled completely without code changes. Setting the least verbose logging level should effectively turn it off.

@SeymourNickelson
Copy link
Author

SeymourNickelson commented Nov 30, 2023

If the same ONNX model is consistently working with the Python API, perhaps the model does not have an issue.

Can you share your code that calls the ORT Objective-C API?

Sure. The tensors are created from arrays of NSNumbers into NSData, then fed to the ORTValue class:

-(ORTValue*)tensorFromSequenceArray:(NSArray<NSNumber*>*)numbers shape:(NSArray*)shape
{
    NSUInteger capacity = numbers.count * sizeof(int64_t);
    NSMutableData *theData = [NSMutableData dataWithCapacity:capacity];
    for (NSNumber *number in numbers)
    {
        int64_t as64 = number.integerValue;
        [theData appendBytes:&as64 length:sizeof(int64_t)];
    }
    
    NSError *error = nil;
    ORTValue *theValue = [[ORTValue alloc]initWithTensorData:theData
                                                 elementType:ORTTensorElementDataTypeInt64
                                                       shape:shape
                                                       error:&error];
    if (theValue != nil)
    {
        return theValue;
    }
    else
    {
        //handle Error...
        return nil;
    }
}

Then they are put in a NSDictionary and fed to the model:

ORTValue *tensorOne = [self tensorFromSequenceArray:sequence shape:tensorOneShape];
ORTValue *tensorTwo = [self tensorFromSequenceArray:sequenceTwo shape:tensorTwoShape];
ORTValue *tensorThree = [self tensorFromSequenceArray:sequenceThree shape:tensorThreeShape];

  NSDictionary *inputs = @{@"arg0": tensorOne,
                                 @"arg1": tensorTwo,
                                 @"arg2": tensorThree};

NSError *error = nil;
   ORTRunOptions *runOptions = [[ORTRunOptions alloc]initWithError:&error];

  //todo handle possible runOptions error.
  NSAssert(runOptions != nil,@"Failed to make run options.")
  
        
 NSDictionary *outputs = [self.model runWithInputs:inputs
                                              outputNames:[NSSet setWithObject:@"outputName"]
                                               runOptions:runOptions error:&error];

Now the array of numbers that are used to make the tensors are the same as the py torch values I initially posted. Sometimes it works but most of the times it crashes.

I haven't bothered trying to change the input until I can figure out why it's crashing intermittently. I tried a few workaround (like adding extra strong references to the ORTValue objects until after the -runWithInput:outNames:runOptions: method returns. I also tried passing copies of my NSData objects to the tensors when creating them. Neither worked.

@SeymourNickelson
Copy link
Author

SeymourNickelson commented Nov 30, 2023

This should be possible. If you provide more details about the error, we can look into it.

I installed the source code and ran the build script on my Mac which generated a onnxruntime.xcodeproj with lots of Targets. I then added the onnxruntime.xcodeproj as a subproject to a sample Xcode project of mine to test my model against the source code (so hopefully I could just set a breakpoint in the function that is crashing to get a clearer view).

It built a couple times (but couldn't get my model to load, I kept getting error code 6 which doesn't happen in the release version).

I was going to try again but now the onnxruntime.xcodeproject keep crashing the Xcode build system. I don't know enough about the CMake Build system to quickly sort through this but if there was a guide of how to get an Xcode project configured to build from the source code I'd be able to run against the source code directly and provide better info (since the CocoaPod ships a prebuilt xcframework I can't step into it and reason about the crashing code).

@edgchen1
Copy link
Contributor

edgchen1 commented Dec 1, 2023

Thanks for providing some of the code. Was also curious about how the ORTEnv and ORTSession are created and in what scope. It would be helpful if you could provide a complete (and hopefully minimal) example program that reproduces the issue.

The prebuilt xcframework is a release build. IIRC, creating a pod with a debug build runs into some CocoaPods size limit issue. You could try building a debug version of onnxruntime.xcframework.

There's a helper script:
https://github.com/microsoft/onnxruntime/blob/v1.16.2/tools/ci_build/github/apple/build_ios_framework.py

You can pass --config Debug for a debug build.

And this is the build settings file used for onnxruntime-c:
https://github.com/microsoft/onnxruntime/blob/v1.16.2/tools/ci_build/github/apple/default_full_ios_framework_build_settings.json

@edgchen1
Copy link
Contributor

edgchen1 commented Dec 6, 2023

Could you also provide the model you used? With that, we can better attempt to reproduce it on our end.

FYI, we've had another report of a crash with a similar call stack.

@edgchen1
Copy link
Contributor

edgchen1 commented Dec 7, 2023

One thing to verify - be sure that the ORTEnv has not been destroyed at the point where ORTSession run is called. An ORTEnv's lifetime needs to eclipse that of any ORTSession initialized with it.

We can improve the API by having ORTSession keep a strong pointer to the ORTEnv.

@SeymourNickelson
Copy link
Author

I ended up dropping down to the C API because it seemed like it could be related to something in the C++ / Objc wrappers and I haven't ran into any issues from C. This was a bit tedious since I have to manually manage memory for everything in C but everything is working.

One thing to verify - be sure that the ORTEnv has not been destroyed at the point where ORTSession run is called. An ORTEnv's lifetime needs to eclipse that of any ORTSession initialized with it.

We can improve the API by having ORTSession keep a strong pointer to the ORTEnv.

That appears to be what was going on 🤦‍♂️. After adding a strong property that holds the ORTEnv I haven't ran into the crash (though I haven't tested extensively, but that seems to be the cause of the crash).

I definitely think this should be improved and that the ORTSession should take ownership of the ORTEnv (considering the fact that if the ORTSession outlives the ORTEnv it causes a crash). By convention the public initializer -initWithEnv:modelPath:sessionOptions:error: implicitly implies that ORTSession does take ownership of the ORTEnv (in the same way that an external strong reference to the path argument wouldn't be expected to be required). I didn't even think to hold an extra strong reference to the ORTEnv externally from my code!

IMO if the ORTEnv needs to be accessed externally from outside the ORTSession after creation it should be exposed as a readonly property on ORTSession (though after the ORTEnv is created I don't see anything particularly interesting in the public header that would cause a client to want or need to access it after the session is created so maybe it should just be private).

Now I feel kind of silly spending all that time writing C code when all I had to do was add a strong reference on the ORTEnv! Now I have to decide whether to use my C code or the provided ObjC++ wrapper.

Thanks a lot for taking the time to respond. Very helpful info and is appreciated.

@edgchen1
Copy link
Contributor

edgchen1 commented Dec 7, 2023

Glad you got it working now.

The documentation of ORTEnv is a bit lacking, but it is in fact required to be kept around while calling other ORT APIs. This is also true for OrtEnv in the C/C++ API, which is what the Objective-C API is implemented with.

I have a PR which hopefully improves this: #18738

Thanks for reporting the issue.

@esclear
Copy link

esclear commented Nov 13, 2024

I don't think this was completely solved with #18738.
I've been able to hit the segfaults with onnxruntime 1.17.1, which should include the changes from the PR.
See pykeio/ort#315 for some context and a reproducer.

From this investigation, this looks like a concurrency issue.
Not too sure what the guarantees / requirements regarding concurrency are in onnxruntime, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
platform:mobile issues related to ONNX Runtime mobile; typically submitted using template
Projects
None yet
Development

No branches or pull requests

3 participants