Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DML EP] out_of_range exception in Dml::GraphDescBuilder::BuildGraphDesc #17516

Open
kazssym opened this issue Sep 12, 2023 · 6 comments · May be fixed by #17581
Open

[DML EP] out_of_range exception in Dml::GraphDescBuilder::BuildGraphDesc #17516

kazssym opened this issue Sep 12, 2023 · 6 comments · May be fixed by #17581
Assignees
Labels
ep:DML issues related to the DirectML execution provider

Comments

@kazssym
Copy link
Contributor

kazssym commented Sep 12, 2023

I got an out_of_range exception at the line below while trying to benchmark with onnxruntime.transformers.models.stable_diffusion.benchmark module.

const auto& outputNodeAndIndex = nameToNodeAndIndexMap.at(graphOutput->Name());

It looks graphNodeCreateInfo defined below was not filled with valid information but subGraphOutputArgNames has an element.

Is it expected to be filled by the factory function?

@github-actions github-actions bot added the ep:DML issues related to the DirectML execution provider label Sep 12, 2023
@sumitsays
Copy link
Contributor

sumitsays commented Sep 14, 2023

@kazssym : DmlGraphNodeCreateInfo should not be null/empty and should also have the operator graph (kernel information) for a given node.
nameToNodeAndIndexMap contains the node in a graph which will have the graph output emitting from it. So it is strange that it is throwing out_of_index exception. It is not expected.
Is it possible for you to share the complete call stack? Also a small test model to investigate it further.

@kazssym
Copy link
Contributor Author

kazssym commented Sep 16, 2023

The code here seems expected to fill graphNodeCreateInfo but it is not. nameToNodeAndIndexMap is not updated either.

@kazssym
Copy link
Contributor Author

kazssym commented Sep 16, 2023

@kazssym : DmlGraphNodeCreateInfo should not null/empty and should also have the operator graph (kernel information) for a given node. nameToNodeAndIndexMap contains the node in a graph which will have the graph output emitting from it. So it is strange that it is throwing out_of_index exception. It is not expected. Is it possible for you to share the complete call stack? Also a small test model to investigate it further.

Here is a call stack.

KernelBase.dll!00007fffe415531c() (Unknown Source:0)
vcruntime140d.dll!00007fffd708b760() (Unknown Source:0)
msvcp140d.dll!00007fffa7c95459() (Unknown Source:0)
onnxruntime_pybind11_state.pyd!std::unordered_map<std::string,`Dml::GraphDescBuilder::BuildGraphDesc'::`2'::NodeAndIndex,std::hash<std::string>,std::equal_to<std::string>,std::allocator<std::pair<std::string const ,`Dml::GraphDescBuilder::BuildGraphDesc'::`2'::NodeAndIndex>>>::at(const std::string & _Keyval) Line 448 (c:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.37.32822\include\unordered_map:448)
onnxruntime_pybind11_state.pyd!Dml::GraphDescBuilder::BuildGraphDesc(const unsigned char * isConstGpuGraphInput, const unsigned __int64 isConstGpuGraphInputCount, const std::unordered_map<std::string,std::pair<onnx::TensorProto const *,bool>,std::hash<std::string>,std::equal_to<std::string>,std::allocator<std::pair<std::string const ,std::pair<onnx::TensorProto const *,bool>>>> & isInitializerTransferable, const onnxruntime::Graph & graph, const onnxruntime::IndexedSubGraph & indexedSubGraph, const std::unordered_map<std::string,Dml::GraphNodeProperties,std::hash<std::string>,std::equal_to<std::string>,std::allocator<std::pair<std::string const ,Dml::GraphNodeProperties>>> & graphNodePropertyMap, IDMLDevice * device, const void * executionHandle) Line 376 (e:\onnxruntime\onnxruntime\core\providers\dml\DmlExecutionProvider\src\GraphDescBuilder.cpp:376)
onnxruntime_pybind11_state.pyd!Dml::DmlGraphFusionTransformer::ApplyImplHelper(onnxruntime::Graph & graph, bool & modified, int graph_level, const onnxruntime::logging::Logger & logger, const std::unordered_map<std::string,onnxruntime::NodeArg const *,std::hash<std::string>,std::equal_to<std::string>,std::allocator<std::pair<std::string const ,onnxruntime::NodeArg const *>>> & implicitInputDefs) Line 195 (e:\onnxruntime\onnxruntime\core\providers\dml\DmlExecutionProvider\src\DmlGraphFusionTransformer.cpp:195)
onnxruntime_pybind11_state.pyd!Dml::DmlGraphFusionTransformer::ApplyImpl(onnxruntime::Graph & graph, bool & modified, int graph_level, const onnxruntime::logging::Logger & logger) Line 42 (e:\onnxruntime\onnxruntime\core\providers\dml\DmlExecutionProvider\src\DmlGraphFusionTransformer.cpp:42)
onnxruntime_pybind11_state.pyd!onnxruntime::GraphTransformer::Apply(onnxruntime::Graph & graph, bool & modified, const onnxruntime::logging::Logger & logger) Line 14 (e:\onnxruntime\onnxruntime\core\optimizer\graph_transformer.cc:14)
onnxruntime_pybind11_state.pyd!onnxruntime::GraphTransformerManager::ApplyTransformers(onnxruntime::Graph & graph, onnxruntime::TransformerLevel level, const onnxruntime::logging::Logger & logger) Line 36 (e:\onnxruntime\onnxruntime\core\optimizer\graph_transformer_mgr.cc:36)
onnxruntime_pybind11_state.pyd!onnxruntime::InferenceSession::TransformGraph(onnxruntime::Graph & graph, bool saving_model_in_ort_format) Line 1054 (e:\onnxruntime\onnxruntime\core\session\inference_session.cc:1054)
onnxruntime_pybind11_state.pyd!onnxruntime::InferenceSession::Initialize() Line 1564 (e:\onnxruntime\onnxruntime\core\session\inference_session.cc:1564)
onnxruntime_pybind11_state.pyd!onnxruntime::python::InitializeSession(onnxruntime::InferenceSession * sess, std::function<void __cdecl(onnxruntime::InferenceSession *,std::vector<std::string,std::allocator<std::string>> const &,std::unordered_map<std::string,std::unordered_map<std::string,std::string,std::hash<std::string>,std::equal_to<std::string>,std::allocator<std::pair<std::string const ,std::string>>>,std::hash<std::string>,std::equal_to<std::string>,std::allocator<std::pair<std::string const ,std::unordered_map<std::string,std::string,std::hash<std::string>,std::equal_to<std::string>,std::allocator<std::pair<std::string const ,std::string>>>>>> const &)> ep_registration_fn, const std::vector<std::string,std::allocator<std::string>> & provider_types, const std::vector<std::unordered_map<std::string,std::string,std::hash<std::string>,std::equal_to<std::string>,std::allocator<std::pair<std::string const ,std::string>>>,std::allocator<std::unordered_map<std::string,std::string,std::hash<std::string>,std::equal_to<std::string>,std::allocator<std::pair<std::string const ,std::string>>>>> & provider_options, const std::unordered_set<std::string,std::hash<std::string>,std::equal_to<std::string>,std::allocator<std::string>> & disabled_optimizer_names) Line 1050 (e:\onnxruntime\onnxruntime\python\onnxruntime_pybind_state.cc:1050)
onnxruntime_pybind11_state.pyd!onnxruntime::python::addObjectMethods::__l2::<lambda>(onnxruntime::python::PyInferenceSession * sess, const std::vector<std::string,std::allocator<std::string>> & provider_types, const std::vector<std::unordered_map<std::string,std::string,std::hash<std::string>,std::equal_to<std::string>,std::allocator<std::pair<std::string const ,std::string>>>,std::allocator<std::unordered_map<std::string,std::string,std::hash<std::string>,std::equal_to<std::string>,std::allocator<std::pair<std::string const ,std::string>>>>> & provider_options, const std::unordered_set<std::string,std::hash<std::string>,std::equal_to<std::string>,std::allocator<std::string>> & disabled_optimizer_names) Line 1717 (e:\onnxruntime\onnxruntime\python\onnxruntime_pybind_state.cc:1717)
onnxruntime_pybind11_state.pyd!pybind11::detail::argument_loader<onnxruntime::python::PyInferenceSession *,std::vector<std::string,std::allocator<std::string>> const &,std::vector<std::unordered_map<std::string,std::string,std::hash<std::string>,std::equal_to<std::string>,std::allocator<std::pair<std::string const ,std::string>>>,std::allocator<std::unordered_map<std::string,std::string,std::hash<std::string>,std::equal_to<std::string>,std::allocator<std::pair<std::string const ,std::string>>>>> const &,std::unordered_set<std::string,std::hash<std::string>,std::equal_to<std::string>,std::allocator<std::string>> const &>::call_impl<void,void <lambda>(onnxruntime::python::PyInferenceSession *, const std::vector<std::string,std::allocator<std::string>> &, const std::vector<std::unordered_map<std::string,std::string,std::hash<std::string>,std::equal_to<std::string>,std::allocator<std::pair<std::string const ,std::string>>>,std::allocator<std::unordered_map<std::string,std::string,std::hash<std::string>,std::equal_to<std::string>,std::allocator<std::pair<std::string const ,std::string>>>>> &, const std::unordered_set<std::string,std::hash<std::string>,std::equal_to<std::string>,std::allocator<std::string>> &) &,0,1,2,3,pybind11::detail::void_type>(onnxruntime::python::addObjectMethods::__l2::void <lambda>(onnxruntime::python::PyInferenceSession *, const std::vector<std::string,std::allocator<std::string>> &, const std::vector<std::unordered_map<std::string,std::string,std::hash<std::string>,std::equal_to<std::string>,std::allocator<std::pair<std::string const ,std::string>>>,std::allocator<std::unordered_map<std::string,std::string,std::hash<std::string>,std::equal_to<std::string>,std::allocator<std::pair<std::string const ,std::string>>>>> &, const std::unordered_set<std::string,std::hash<std::string>,std::equal_to<std::string>,std::allocator<std::string>> &) & f, std::integer_sequence<unsigned __int64,0,1,2,3> __formal, pybind11::detail::void_type && __formal) Line 1440 (e:\onnxruntime\build\Windows\Debug\_deps\pybind11_project-src\include\pybind11\cast.h:1440)
onnxruntime_pybind11_state.pyd!pybind11::detail::argument_loader<onnxruntime::python::PyInferenceSession *,std::vector<std::string,std::allocator<std::string>> const &,std::vector<std::unordered_map<std::string,std::string,std::hash<std::string>,std::equal_to<std::string>,std::allocator<std::pair<std::string const ,std::string>>>,std::allocator<std::unordered_map<std::string,std::string,std::hash<std::string>,std::equal_to<std::string>,std::allocator<std::pair<std::string const ,std::string>>>>> const &,std::unordered_set<std::string,std::hash<std::string>,std::equal_to<std::string>,std::allocator<std::string>> const &>::call<void,pybind11::detail::void_type,void <lambda>(onnxruntime::python::PyInferenceSession *, const std::vector<std::string,std::allocator<std::string>> &, const std::vector<std::unordered_map<std::string,std::string,std::hash<std::string>,std::equal_to<std::string>,std::allocator<std::pair<std::string const ,std::string>>>,std::allocator<std::unordered_map<std::string,std::string,std::hash<std::string>,std::equal_to<std::string>,std::allocator<std::pair<std::string const ,std::string>>>>> &, const std::unordered_set<std::string,std::hash<std::string>,std::equal_to<std::string>,std::allocator<std::string>> &) &>(onnxruntime::python::addObjectMethods::__l2::void <lambda>(onnxruntime::python::PyInferenceSession *, const std::vector<std::string,std::allocator<std::string>> &, const std::vector<std::unordered_map<std::string,std::string,std::hash<std::string>,std::equal_to<std::string>,std::allocator<std::pair<std::string const ,std::string>>>,std::allocator<std::unordered_map<std::string,std::string,std::hash<std::string>,std::equal_to<std::string>,std::allocator<std::pair<std::string const ,std::string>>>>> &, const std::unordered_set<std::string,std::hash<std::string>,std::equal_to<std::string>,std::allocator<std::string>> &) & f) Line 1415 (e:\onnxruntime\build\Windows\Debug\_deps\pybind11_project-src\include\pybind11\cast.h:1415)
onnxruntime_pybind11_state.pyd!pybind11::cpp_function::initialize::__l2::<lambda>(pybind11::detail::function_call & call) Line 249 (e:\onnxruntime\build\Windows\Debug\_deps\pybind11_project-src\include\pybind11\pybind11.h:249)
onnxruntime_pybind11_state.pyd!pybind11::handle <lambda>(pybind11::detail::function_call &)::<lambda_invoker_cdecl>(pybind11::detail::function_call & call) Line 167 (e:\onnxruntime\build\Windows\Debug\_deps\pybind11_project-src\include\pybind11\pybind11.h:167)
onnxruntime_pybind11_state.pyd!pybind11::cpp_function::dispatcher(_object * self, _object * args_in, _object * kwargs_in) Line 929 (e:\onnxruntime\build\Windows\Debug\_deps\pybind11_project-src\include\pybind11\pybind11.h:929)
python310.dll!00007fff30229eea() (Unknown Source:0)
python310.dll!00007fff3026ffbb() (Unknown Source:0)

@kazssym
Copy link
Contributor Author

kazssym commented Sep 16, 2023

DmlOperatorMemcpy::DmlOperatorMemcpy does never call SetDmlOperatorDesc?

@sumitsays
Copy link
Contributor

@kazssym Thank you for sharing the call stack. It does look like nameToNodeAndIndexMap might not has an entry for an operator. As you have shared above, the operator might be Memcpy.
Can you please share which exact version of Stable Diffusion model you are using and benchmarking? Or is it possible for you to share the script you are running, which I can run on my end to reproduce the issue?

@kazssym
Copy link
Contributor Author

kazssym commented Sep 21, 2023

@kazssym Thank you for sharing the call stack. It does look like nameToNodeAndIndexMap might not has an entry for an operator. As you have shared above, the operator might be Memcpy. Can you please share which exact version of Stable Diffusion model you are using and benchmarking? Or is it possible for you to share the script you are running, which I can run on my end to reproduce the issue?

I am running the following command with https://huggingface.co/kazssym/stable-diffusion-2-1-optimized-fp16 on main...kazssym:onnxruntime:dml-transformers-testing.

python -m onnxruntime.transformers.models.stable_diffusion.benchmark --provider dml --version 2.1 --pipeline stable-diffusion-2-1-optimized-fp16 --height 768 --width 768

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ep:DML issues related to the DirectML execution provider
Projects
None yet
2 participants