Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Web] Memory access out of bounds / alignment fault #21355

Closed
cmario opened this issue Jul 15, 2024 · 3 comments
Closed

[Web] Memory access out of bounds / alignment fault #21355

cmario opened this issue Jul 15, 2024 · 3 comments
Labels
platform:web issues related to ONNX Runtime web; typically submitted using template

Comments

@cmario
Copy link

cmario commented Jul 15, 2024

Describe the issue

Hello,

I am exploring the use of ONNX, with a particular focus on the ORT model format for web applications. I developed a basic WASM module to perform inference using a UNET-like semantic segmentation model. However, the inference process throws an exception, which I have detailed below. Please note that the same code runs without issues outside of the WASM module.

I built the ONNX runtime for web with the following command:

./build.sh --config Release --build_wasm_static_lib --minimal_build --skip_tests --disable_wasm_exception_catching --disable_rtti

I built the WASM module with the following command:

emcc -g myModule.cpp -o myModule.js -I<opencv headers> -I<onnxruntime headers> -L<opencv lib> -L<onnxruntime lib> -lopencv_core -lopencv_imgproc -lonnxruntime_webassembly -s INITIAL_MEMORY=256MB -s EXPORTED_FUNCTIONS="['_processImage', '_malloc', '_free']" -s EXPORTED_RUNTIME_METHODS="['ccall', 'cwrap']" -s SAFE_HEAP=0 --bind

When running the inference I get the following error:

2024-07-15 18:48:24.453600 [I:onnxruntime:, inference_session.cc:514 TraceSessionOptions] Session Options {  execution_mode:0 execution_order:DEFAULT enable_profiling:0 optimized_model_filepath: enable_mem_pattern:1 enable_mem_reuse:1 enable_cpu_mem_arena:1 profile_file_prefix:onnxruntime_profile_ session_logid: session_log_severity_level:-1 session_log_verbosity_level:0 max_num_graph_transformation_steps:10 graph_optimization_level:2 intra_op_param:OrtThreadPoolParams { thread_pool_size: 1 auto_set_affinity: 0 allow_spinning: 1 dynamic_block_base_: 0 stack_size: 0 affinity_str:  set_denormal_as_zero: 0 } inter_op_param:OrtThreadPoolParams { thread_pool_size: 0 auto_set_affinity: 0 allow_spinning: 1 dynamic_block_base_: 0 stack_size: 0 affinity_str:  set_denormal_as_zero: 0 } use_per_session_threads:1 thread_pool_allow_spinning:1 use_deterministic_compute:0 config_options: {  } }
2024-07-15 18:48:24.454500 [I:onnxruntime:, inference_session.cc:414 operator()] Flush-to-zero and denormal-as-zero are off
2024-07-15 18:48:24.454600 [I:onnxruntime:, inference_session.cc:422 ConstructorCommon] Creating and using per session threadpools since use_per_session_threads_ is true
2024-07-15 18:48:24.454800 [I:onnxruntime:, inference_session.cc:440 ConstructorCommon] Dynamic block base set to 0
2024-07-15 18:48:24.462000 [I:onnxruntime:, inference_session.cc:1583 Initialize] Initializing session.
2024-07-15 18:48:24.462100 [I:onnxruntime:, inference_session.cc:1620 Initialize] Adding default CPU execution provider.
2024-07-15 18:48:24.485000 [V:onnxruntime:, session_state.cc:126 CreateGraphInfo] SaveMLValueNameIndexMapping
2024-07-15 18:48:24.485500 [V:onnxruntime:, session_state.cc:172 CreateGraphInfo] Done saving OrtValue mappings.
2024-07-15 18:48:24.488600 [I:onnxruntime:, session_state_utils.cc:201 SaveInitializedTensors] Saving initialized tensors.
2024-07-15 18:48:24.489500 [I:onnxruntime:, session_state_utils.cc:345 SaveInitializedTensors] Done saving initialized tensors
2024-07-15 18:48:24.491300 [I:onnxruntime:, inference_session.cc:1969 Initialize] Session successfully initialized.

With SAFE_HEAP=0:

RuntimeError: memory access out of bounds
    at myModule.wasm.MlasSgemmOperation(CBLAS_TRANSPOSE, CBLAS_TRANSPOSE, unsigned long, unsigned long, unsigned long, float, float const*, unsigned long, float const*, unsigned long, float, float*, unsigned long) (http://localhost:8000/myModule.wasm:wasm-function[9520]:0x24bf4a)
    at myModule.wasm.MlasConvOperation(MLAS_CONV_PARAMETERS const*, float const*, float const*, float const*, float*, float*, unsigned long, unsigned long) (http://localhost:8000/myModule.wasm:wasm-function[10013]:0x2921cc)
    at myModule.wasm.MlasConv(MLAS_CONV_PARAMETERS const*, float const*, float const*, float const*, float*, float*, onnxruntime::concurrency::ThreadPool*) (http://localhost:8000/myModule.wasm:wasm-function[9978]:0x28c58a)
    at myModule.wasm.onnxruntime::Conv<float>::Compute(onnxruntime::OpKernelContext*) const (http://localhost:8000/myModule.wasm:wasm-function[9965]:0x288f4c)
    at myModule.wasm.onnxruntime::LaunchKernelStep::Execute(onnxruntime::StreamExecutionContext&, unsigned long, onnxruntime::SessionScope&, bool const&, bool&) (http://localhost:8000/myModule.wasm:wasm-function[7213]:0x1a3e82)
    at myModule.wasm.onnxruntime::RunSince(unsigned long, onnxruntime::StreamExecutionContext&, onnxruntime::SessionScope&, bool const&, unsigned long) (http://localhost:8000/myModule.wasm:wasm-function[7221]:0x1a6931)
    at myModule.wasm.onnxruntime::ExecuteThePlan(onnxruntime::SessionState const&, gsl::span<int const, 4294967295ul>, gsl::span<OrtValue const, 4294967295ul>, gsl::span<int const, 4294967295ul>, std::__2::vector<OrtValue, std::__2::allocator<OrtValue>>&, std::__2::unordered_map<unsigned long, std::__2::function<onnxruntime::common::Status (onnxruntime::TensorShape const&, OrtDevice const&, OrtValue&, bool&)>, std::__2::hash<unsigned long>, std::__2::equal_to<unsigned long>, std::__2::allocator<std::__2::pair<unsigned long const, std::__2::function<onnxruntime::common::Status (onnxruntime::TensorShape const&, OrtDevice const&, OrtValue&, bool&)>>>> const&, onnxruntime::logging::Logger const&, bool const&, bool, bool) (http://localhost:8000/myModule.wasm:wasm-function[6719]:0x152035)
    at myModule.wasm.onnxruntime::utils::ExecuteGraphImpl(onnxruntime::SessionState const&, onnxruntime::FeedsFetchesManager const&, gsl::span<OrtValue const, 4294967295ul>, std::__2::vector<OrtValue, std::__2::allocator<OrtValue>>&, std::__2::unordered_map<unsigned long, std::__2::function<onnxruntime::common::Status (onnxruntime::TensorShape const&, OrtDevice const&, OrtValue&, bool&)>, std::__2::hash<unsigned long>, std::__2::equal_to<unsigned long>, std::__2::allocator<std::__2::pair<unsigned long const, std::__2::function<onnxruntime::common::Status (onnxruntime::TensorShape const&, OrtDevice const&, OrtValue&, bool&)>>>> const&, ExecutionMode, bool const&, onnxruntime::logging::Logger const&, bool, onnxruntime::Stream*) (http://localhost:8000/myModule.wasm:wasm-function[6718]:0x14f436)
    at myModule.wasm.onnxruntime::InferenceSession::Run(OrtRunOptions const&, gsl::span<std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char>> const, 4294967295ul>, gsl::span<OrtValue const, 4294967295ul>, gsl::span<std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char>> const, 4294967295ul>, std::__2::vector<OrtValue, std::__2::allocator<OrtValue>>*, std::__2::vector<OrtDevice, std::__2::allocator<OrtDevice>> const*) (http://localhost:8000/myModule.wasm:wasm-function[17805]:0x6f648b)
    at myModule.wasm.onnxruntime::InferenceSession::Run(OrtRunOptions const&, gsl::span<char const* const, 4294967295ul>, gsl::span<OrtValue const* const, 4294967295ul>, gsl::span<char const* const, 4294967295ul>, gsl::span<OrtValue*, 4294967295ul>) (http://localhost:8000/myModule.wasm:wasm-function[5392]:0xf1758)

With SAFE_HEAP=1:

RuntimeError: Aborted(alignment fault)
    at abort (http://localhost:8000/myModule.js:625:41)
    at alignfault (http://localhost:8000/myModule.js:354:3)
    at myModule.wasm (http://localhost:8000/myModule.wasm:wasm-function[17477]:0x862651)
    at myModule.wasm.MlasConvIm2Col(MLAS_CONV_PARAMETERS const*, float const*, float*, unsigned long, unsigned long, unsigned long, unsigned long) (http://localhost:8000/myModule.wasm:wasm-function[9072]:0x2e0492)
    at myModule.wasm.MlasConvOperation(MLAS_CONV_PARAMETERS const*, float const*, float const*, float const*, float*, float*, unsigned long, unsigned long) (http://localhost:8000/myModule.wasm:wasm-function[9074]:0x2e119c)
    at myModule.wasm.MlasConv(MLAS_CONV_PARAMETERS const*, float const*, float const*, float const*, float*, float*, onnxruntime::concurrency::ThreadPool*) (http://localhost:8000/myModule.wasm:wasm-function[9039]:0x2da858)
    at myModule.wasm.onnxruntime::Conv<float>::Compute(onnxruntime::OpKernelContext*) const (http://localhost:8000/myModule.wasm:wasm-function[9026]:0x2d67b0)
    at myModule.wasm.onnxruntime::LaunchKernelStep::Execute(onnxruntime::StreamExecutionContext&, unsigned long, onnxruntime::SessionScope&, bool const&, bool&) (http://localhost:8000/myModule.wasm:wasm-function[6274]:0x1c14f5)
    at myModule.wasm.onnxruntime::RunSince(unsigned long, onnxruntime::StreamExecutionContext&, onnxruntime::SessionScope&, bool const&, unsigned long) (http://localhost:8000/myModule.wasm:wasm-function[6282]:0x1c49bf)
    at myModule.wasm.onnxruntime::ExecuteThePlan(onnxruntime::SessionState const&, gsl::span<int const, 4294967295ul>, gsl::span<OrtValue const, 4294967295ul>, gsl::span<int const, 4294967295ul>, std::__2::vector<OrtValue, std::__2::allocator<OrtValue>>&, std::__2::unordered_map<unsigned long, std::__2::function<onnxruntime::common::Status (onnxruntime::TensorShape const&, OrtDevice const&, OrtValue&, bool&)>, std::__2::hash<unsigned long>, std::__2::equal_to<unsigned long>, std::__2::allocator<std::__2::pair<unsigned long const, std::__2::function<onnxruntime::common::Status (onnxruntime::TensorShape const&, OrtDevice const&, OrtValue&, bool&)>>>> const&, onnxruntime::logging::Logger const&, bool const&, bool, bool) (http://localhost:8000/myModule.wasm:wasm-function[5780]:0x15fa3a)

Best regards,
Mario

To reproduce

Here is the code I used to test the ORT model:

extern "C" {
EMSCRIPTEN_KEEPALIVE
void processImage(const uint8_t* inputImageData, size_t inputImageDataSize, uint8_t* outputImageData, int width, int height) {
    cv::Mat image(height, width, CV_8UC4, const_cast<uint8_t*>(inputImageData));
    cv::Mat rgbImage;
    cv::cvtColor(image, rgbImage, cv::COLOR_BGRA2RGB);
    cv::Mat resizedImage;
    cv::resize(rgbImage, resizedImage, cv::Size(256, 256), 0, 0, cv::INTER_AREA);
    cv::Mat f32Image;
    resizedImage.convertTo(f32Image, CV_32F, 1.0 / 255);
    //
    std::vector<float> inputData;
    inputData.assign((float *) f32Image.datastart, (float *) f32Image.dataend);
    //
    Ort::SessionOptions session_options;
    session_options.SetIntraOpNumThreads(1);
    session_options.SetGraphOptimizationLevel(GraphOptimizationLevel::ORT_ENABLE_EXTENDED);
    // Decode the base64 model
    std::vector<uint8_t> model_data = base64_decode(base64_model);
    // Load the model from memory
    Ort::Env env(ORT_LOGGING_LEVEL_VERBOSE, "test");
    Ort::MemoryInfo memory_info = Ort::MemoryInfo::CreateCpu(OrtAllocatorType::OrtArenaAllocator, OrtMemType::OrtMemTypeDefault);
    Ort::AllocatorWithDefaultOptions allocator;
    Ort::Session session(env, model_data.data(), model_data.size(), session_options);
    // input tensor
    std::vector<int64_t> inputShape = {1, 256, 256, 3};
    Ort::Value inputTensor = Ort::Value::CreateTensor<float>(memory_info, inputData.data(), inputData.size(),
                                                             inputShape.data(), inputShape.size());
    // output tensor
    std::vector<float> outputData(256 * 256 * 4);
    std::vector<int64_t> outputShape = {1, 256, 256, 1};
    Ort::Value outputTensor = Ort::Value::CreateTensor<float>(memory_info,
                                                              outputData.data(), outputData.size(),
                                                              outputShape.data(), outputShape.size());

    auto input_name_alloc = session.GetInputNameAllocated(0, allocator);
    const char *input_name = input_name_alloc.get();
    auto output_name_alloc = session.GetOutputNameAllocated(0, allocator);
    const char *output_name = output_name_alloc.get();

    // Run inference
    session.Run(Ort::RunOptions{nullptr}, &input_name, &inputTensor, 1, &output_name, &outputTensor, 1);

    // Process output tensor
    auto *float_array = outputTensor.GetTensorMutableData<float>();

    // Convert the output tensor to cv::Mat
    cv::Mat outputImg(256, 256, CV_32FC1, float_array);
    outputImg.convertTo(outputImg, CV_8UC1, 255.0);
    cv::resize(outputImg, outputImg, cv::Size(width, height));
    // Copy the output image to the outputImageData buffer
    std::memcpy(outputImageData, outputImg.data, width * height);
}
}

Urgency

No response

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

v1.17.1

Execution Provider

'wasm'/'cpu' (WebAssembly CPU)

@cmario cmario added the platform:web issues related to ONNX Runtime web; typically submitted using template label Jul 15, 2024
@YoniGBinahAi
Copy link

YoniGBinahAi commented Aug 4, 2024

I have the same issue with version 1.16.3 and emsdk 3.1.44
my build cmd is :
./build.sh --config Debug --enable_wasm_simd --emsdk_version=3.1.44 --build_wasm_static_lib --enable_wasm_exception_throwing_override --enable_wasm_threads --enable_wasm_api_exception_catching --skip_tests

error :
image

@YoniGBinahAi
Copy link

@cmario - we have noticed the function MlasSgemmOperation is consuming lots of stack memory. Since wasm by default allocate only 5MB for the stack, it fails there. You can try and add the following flag and see if it solves your issue. it did help us :
-s TOTAL_STACK=10MB

@cmario
Copy link
Author

cmario commented Aug 9, 2024

@YoniGBinahAi Thank you very much for your feedback, increasing the total stack to 10MB resolves the issue.

@cmario cmario closed this as completed Sep 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
platform:web issues related to ONNX Runtime web; typically submitted using template
Projects
None yet
Development

No branches or pull requests

2 participants