Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Performance] Mapfile support for certain external data files is not working #21195

Open
ivberg opened this issue Jun 27, 2024 · 3 comments
Open
Labels
core runtime issues related to core runtime performance issues related to performance regressions platform:windows issues related to the Windows platform

Comments

@ivberg
Copy link
Contributor

ivberg commented Jun 27, 2024

Describe the issue

We are attempting to get mapfile support working well using external data files. The model loads fine and works, but while debugging we noticed mapfile support is not working well and error'ing out inside ORT code

#19089
https://github.com/onnx/onnx/blob/main/docs/ExternalData.md

Callstack where the mapfile fails due to alignment issues:
00 ps_onnxruntime!onnxruntime::WindowsEnv::MapFileIntoMemory+0xa90 [D:\a_work\1\s\onnxruntime\onnxruntime\core\platform\windows\env.cc @ 449] // Failure here
01 ps_onnxruntime!onnxruntime::utils::GetFileContent+0x12c [D:\a_work\1\s\onnxruntime\onnxruntime\core\framework\tensorprotoutils.cc @ 899]
02 ps_onnxruntime!onnxruntime::utils::GetExtDataFromTensorProto+0x484 [D:\a_work\1\s\onnxruntime\onnxruntime\core\framework\tensorprotoutils.cc @ 1015] // The buffer size, length is coming from here
03 ps_onnxruntime!onnxruntime::session_state_utils::ExtDataTensorProtoToTensor+0x8c [D:\a_work\1\s\onnxruntime\onnxruntime\core\framework\session_state_utils.cc @ 73]
04 ps_onnxruntime!onnxruntime::session_state_utils::DeserializeTensorProto+0x37c [D:\a_work\1\s\onnxruntime\onnxruntime\core\framework\session_state_utils.cc @ 126]
05 ps_onnxruntime!onnxruntime::session_state_utils::SaveInitializedTensors+0x1208 [D:\a_work\1\s\onnxruntime\onnxruntime\core\framework\session_state_utils.cc @ 310]
06 ps_onnxruntime!onnxruntime::SessionState::FinalizeSessionStateImpl+0x76c [D:\a_work\1\s\onnxruntime\onnxruntime\core\framework\session_state.cc @ 1476]
07 ps_onnxruntime!onnxruntime::SessionState::FinalizeSessionState+0x1b4 [D:\a_work\1\s\onnxruntime\onnxruntime\core\framework\session_state.cc @ 1189]
08 ps_onnxruntime!onnxruntime::InferenceSession::Initialize+0x2178 [D:\a_work\1\s\onnxruntime\onnxruntime\core\session\inference_session.cc @ 2015]
09 ps_onnxruntime!`anonymous namespace'::InitializeSession+0x250 [D:\a_work\1\s\onnxruntime\onnxruntime\core\session\onnxruntime_c_api.cc @ 763]
0a ps_onnxruntime!OrtApis::CreateSession+0xa0 [D:\a_work\1\s\onnxruntime\onnxruntime\core\session\onnxruntime_c_api.cc @ 779]

Instead we are hitting an error "mapped offset must be a multiple of the allocation granularity"..." from ORT and swallowing it. I say swallowing it because as per other stack yes we go on the error path reading the whole file into the buffer as backup.

To reproduce

Get a model with external data file. e.g. model.onnx & model.onnx.data. Not all files will reproduce the issue due to alignment with the target OS

const ORTCHAR_T * filemodelpath = ORT_TSTR("model.onnx");
Load with: Ort::Session(env, filemodelpath, session_options);

// The model seems to load fine and works with external data file

Urgency

Fairly urgent

For now trying workaround with AddExternalInitializersFromFilesInMemory

Platform

Windows

OS Version

23H2

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

55f7f9d

ONNX Runtime API

C++

Architecture

ARM64

Execution Provider

Default CPU

Execution Provider Library Version

No response

Model File

No response

Is this a quantized model?

Yes

@github-actions github-actions bot added platform:windows issues related to the Windows platform quantization issues related to quantization labels Jun 27, 2024
@pranavsharma pranavsharma added core runtime issues related to core runtime and removed quantization issues related to quantization labels Jun 27, 2024
@sophies927 sophies927 added the performance issues related to performance regressions label Jun 27, 2024
@pranavsharma
Copy link
Contributor

Can you attach a sample model? and this happens on ARM64 only?

@ivberg
Copy link
Contributor Author

ivberg commented Jul 3, 2024

We are seeing about sharing the model directly. It seems the alignment issue could happen on multiple platforms. I happen to be testing / using ARM64 though.

@justinchuby
Copy link
Contributor

External data produced by the ONNX exporter by PyTorch 2.5 will be aligned.

@satyajandhyala satyajandhyala unpinned this issue Sep 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core runtime issues related to core runtime performance issues related to performance regressions platform:windows issues related to the Windows platform
Projects
None yet
Development

No branches or pull requests

5 participants
@pranavsharma @justinchuby @ivberg @sophies927 and others