Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault on session creation with custom MockedOrtAllocator in MlasSgemmCopyPackB #17867

Open
SolomidHero opened this issue Oct 10, 2023 · 3 comments
Labels
stale issues that have not been addressed in a while; categorized by a bot

Comments

@SolomidHero
Copy link

SolomidHero commented Oct 10, 2023

Describe the issue

Hi! I wanted to use custom allocator to better understand where memory allocation is requested, but couldn't even create session with my module. I found that issue happens in matmul operation (maybe some others too).
I tried to change operator new to malloc/free and also log every allocation/deallocation. Allocations were without errors but issue happens.

Here is my debugging log:

* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT)
    frame #0: 0x00000001051db424 libonnxruntime.1.14.1.dylib`MlasSgemmCopyPackB(float*, float const*, unsigned long, unsigned long, unsigned long) + 708
libonnxruntime.1.14.1.dylib`MlasSgemmCopyPackB:
->  0x1051db424 <+708>: movaps %xmm0, -0x18(%rax)
    0x1051db428 <+712>: movaps %xmm0, -0x28(%rax)
    0x1051db42c <+716>: movaps %xmm0, -0x38(%rax)
    0x1051db430 <+720>: movaps %xmm0, -0x48(%rax)
Target 0: (BenchOnnxModel) stopped.

bt:

* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT)
  * frame #0: 0x00000001051db424 libonnxruntime.1.14.1.dylib`MlasSgemmCopyPackB(float*, float const*, unsigned long, unsigned long, unsigned long) + 708
    frame #1: 0x00000001051de6c2 libonnxruntime.1.14.1.dylib`MlasGemmPackB(CBLAS_TRANSPOSE, unsigned long, unsigned long, float const*, unsigned long, void*) + 130
    frame #2: 0x0000000104b8db7b libonnxruntime.1.14.1.dylib`onnxruntime::GemmPackBFp32(std::__1::shared_ptr<onnxruntime::IAllocator>&, onnxruntime::Tensor const&, bool, std::__1::unique_ptr<void, onnxruntime::BufferDeleter>&, unsigned long&, onnxruntime::TensorShape&) + 331
    frame #3: 0x0000000104b950af libonnxruntime.1.14.1.dylib`onnxruntime::MatMul<float>::PrePack(onnxruntime::Tensor const&, int, std::__1::shared_ptr<onnxruntime::IAllocator>, bool&, onnxruntime::PrePackedWeights*) + 95
    frame #4: 0x0000000105089014 libonnxruntime.1.14.1.dylib`onnxruntime::SessionState::PrepackConstantInitializedTensors(onnxruntime::InlinedHashMap<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, unsigned long, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const, unsigned long> > >&, std::__1::unordered_map<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, OrtValue const*, std::__1::hash<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, std::__1::equal_to<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const, OrtValue const*> > > const&)::$_1::operator()(bool) const + 1348
    frame #5: 0x0000000105088a8f libonnxruntime.1.14.1.dylib`onnxruntime::SessionState::PrepackConstantInitializedTensors(onnxruntime::InlinedHashMap<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, unsigned long, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const, unsigned long> > >&, std::__1::unordered_map<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, OrtValue const*, std::__1::hash<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, std::__1::equal_to<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const, OrtValue const*> > > const&) + 127
    frame #6: 0x000000010508e734 libonnxruntime.1.14.1.dylib`onnxruntime::SessionState::FinalizeSessionStateImpl(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, onnxruntime::KernelRegistryManager const&, onnxruntime::Node const*, onnxruntime::SessionOptions const&, bool, onnxruntime::InlinedHashMap<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, unsigned long, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const, unsigned long> > >&, onnxruntime::InlinedHashMap<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, OrtMemoryInfo, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const, OrtMemoryInfo> > > const&, bool) + 2596
    frame #7: 0x000000010508cf83 libonnxruntime.1.14.1.dylib`onnxruntime::SessionState::FinalizeSessionState(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, onnxruntime::KernelRegistryManager const&, bool, bool) + 1555
    frame #8: 0x00000001047c9335 libonnxruntime.1.14.1.dylib`onnxruntime::InferenceSession::Initialize() + 8949
    frame #9: 0x00000001047fe46f libonnxruntime.1.14.1.dylib`(anonymous namespace)::InitializeSession(OrtSessionOptions const*, std::__1::unique_ptr<onnxruntime::InferenceSession, std::__1::default_delete<onnxruntime::InferenceSession> >&, OrtPrepackedWeightsContainer*) + 927
    frame #10: 0x00000001047fe6cd libonnxruntime.1.14.1.dylib`OrtApis::CreateSessionFromArray(OrtEnv const*, void const*, unsigned long, OrtSessionOptions const*, OrtSession**) + 93
    frame #11: 0x000000010003f3df BenchOnnxModel`ORTSessionRunner::ORTSessionRunner(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >) [inlined] Ort::Session::Session(this=0x0000000102400060, env=0x0000000102400118, model_data=0x00000001044f2f00, model_data_length=123, options=0x00000001024244d8) at onnxruntime_cxx_inline.h:947:16 [opt]
    frame #12: 0x000000010003f3b4 BenchOnnxModel`ORTSessionRunner::ORTSessionRunner(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >) [inlined] Ort::Session::Session(this=0x0000000102400060, env=0x0000000102400118, model_data=0x00000001044f2f00, model_data_length=123, options=0x00000001024244d8) at onnxruntime_cxx_inline.h:946:122 [opt]
    frame #13: 0x000000010003f3b4 BenchOnnxModel`ORTSessionRunner::ORTSessionRunner(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >) [inlined] std::__1::__unique_if<Ort::Session>::__unique_single std::__1::make_unique<Ort::Session, Ort::Env&, void const*, unsigned long, DefaultORTSessionOptions&>(__args=0x0000000102400118, __args=0x00000001024244d8) at unique_ptr.h:728:32 [opt]

To reproduce

  1. Use MockedOrtAllocator from test_allocator.h.
  2. Create allocator, env, memory info and session:
#include <onnxruntime_cxx_api.h>

auto& api = Ort::GetApi();
env = std::make_unique<Ort::Env>(std::move(Ort::Env(ORT_LOGGING_LEVEL_WARNING)));
custom_alloc = std::make_unique<CustomAllocator>();
api.RegisterAllocator(*env, custom_alloc.get());

session_ = std::make_unique<Ort::Session>(*env, "onnxruntime/test/testdata/matmul_1.onnx", options_); // fails for some models

2'. I used following session options:

SetInterOpNumThreads(1);
SetIntraOpNumThreads(1);
DisableCpuMemArena();
DisableMemPattern();

AddConfigEntry("session.use_env_allocators", "1");

SetExecutionMode(ExecutionMode::ORT_SEQUENTIAL);
SetGraphOptimizationLevel(GraphOptimizationLevel::ORT_ENABLE_ALL);
  1. compile and run with onnxruntime/test/testdata/matmul_1.onnx

Happened both on 1.14.1 and 1.15.1 onnxruntime MacOS releases

Urgency

medium priority
Issue connected to my experience with onnxruntime:
#16032
#10270

Platform

Mac

OS Version

12.3.1

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.14.1

ONNX Runtime API

C++

Architecture

X64

Execution Provider

Default CPU

Execution Provider Library Version

No response

@SolomidHero SolomidHero changed the title Segmentation fault with custom MockedOrtAllocator in MlasSgemmCopyPackB Segfault on session creation with custom MockedOrtAllocator in MlasSgemmCopyPackB Oct 10, 2023
@snnn
Copy link
Member

snnn commented Oct 11, 2023

Thanks for pointing it out. The code you pointed to is out of sync with https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/framework/allocator.cc
See how AllocatorDefaultAlloc is implemented. And I cannot copy the implementation to MockedOrtAllocator because MlasGetPreferredBufferAlignment is internal.

Copy link
Contributor

This issue has been automatically marked as stale due to inactivity and will be closed in 7 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

@github-actions github-actions bot added the stale issues that have not been addressed in a while; categorized by a bot label Nov 11, 2023
@snnn
Copy link
Member

snnn commented Jan 12, 2024

Do not close it yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale issues that have not been addressed in a while; categorized by a bot
Projects
None yet
Development

No branches or pull requests

2 participants