Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Build] Can't run dml with nodejs after building #23027

Open
gutembergsa opened this issue Dec 5, 2024 · 3 comments
Open

[Build] Can't run dml with nodejs after building #23027

gutembergsa opened this issue Dec 5, 2024 · 3 comments
Labels
api:Javascript issues related to the Javascript API build build issues; typically submitted using template ep:DML issues related to the DirectML execution provider

Comments

@gutembergsa
Copy link

gutembergsa commented Dec 5, 2024

Describe the issue

I'm following the docs here:
https://onnxruntime.ai/docs/get-started/with-javascript/node.html
and here:
https://onnxruntime.ai/docs/build/inferencing.html

I have successfully build the lib for usage with dml:
.\build.bat --config RelWithDebInfo --build_shared_lib --build_nodejs --parallel --use_dml --parallel
and installed the lib from source into my project

But when I try to use it dml:

InferenceSession.create(MODEL_PATH_ONNX, {
        executionProviders: ['dml'],
    })

it gives a warning:

2024-12-05 15:05:45.5713727 [W:onnxruntime:, session_state.cc:1168 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-12-05 15:05:45.5772899 [W:onnxruntime:, session_state.cc:1170 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.

Urgency

No response

Target platform

Node.js

Build script

.\build.bat --config RelWithDebInfo --build_shared_lib --build_nodejs --parallel --use_dml --parallel

Error / output

Running InferenceSession.create with verbose return:

2024-12-05 15:16:50.4441563 [I:onnxruntime:, inference_session.cc:589 onnxruntime::InferenceSession::TraceSessionOptions] Session Options {  execution_mode:0 execution_order:DEFAULT enable_profiling:0 optimized_model_filepath: enable_mem_pattern:1 enable_mem_reuse:1 enable_cpu_mem_arena:1 profile_file_prefix:onnxruntime_profile_ session_logid: session_log_severity_level:1 session_log_verbosity_level:0 max_num_graph_transformation_steps:10 graph_optimization_level:3 intra_op_param:OrtThreadPoolParams { thread_pool_size: 0 auto_set_affinity: 0 allow_spinning: 1 dynamic_block_base_: 0 stack_size: 0 affinity_str:  set_denormal_as_zero: 0 } inter_op_param:OrtThreadPoolParams { thread_pool_size: 0 auto_set_affinity: 0 allow_spinning: 1 dynamic_block_base_: 0 stack_size: 0 affinity_str:  set_denormal_as_zero: 0 } use_per_session_threads:1 thread_pool_allow_spinning:1 use_deterministic_compute:0 config_options: {  } }
2024-12-05 15:16:50.4592092 [I:onnxruntime:, inference_session.cc:409 onnxruntime::InferenceSession::ConstructorCommon::<lambda_1d598fe7d56d7ba80443fa35e896765d>::operator ()] Flush-to-zero and denormal-as-zero are off
2024-12-05 15:16:50.4627197 [I:onnxruntime:, inference_session.cc:417 onnxruntime::InferenceSession::ConstructorCommon] Creating and using per session threadpools since use_per_session_threads_ is true
2024-12-05 15:16:50.4658864 [I:onnxruntime:, inference_session.cc:435 onnxruntime::InferenceSession::ConstructorCommon] Dynamic block base set to 0
2024-12-05 15:16:50.5311172 [I:onnxruntime:, inference_session.cc:778 onnxruntime::InferenceSession::RegisterExecutionProvider] Having memory pattern enabled is not supported while using DmlExecutionProvider. So disabling it for this session since it uses DmlExecutionProvider.
2024-12-05 15:16:50.5364908 [I:onnxruntime:, inference_session.cc:1699 onnxruntime::InferenceSession::Initialize] Initializing session.
2024-12-05 15:16:50.5393459 [I:onnxruntime:, inference_session.cc:1736 onnxruntime::InferenceSession::Initialize] Adding default CPU execution provider.
2024-12-05 15:16:50.5444007 [I:onnxruntime:, graph_partitioner.cc:898 onnxruntime::GraphPartitioner::InlineFunctionsAOT] This model does not have any local functions defined. AOT Inlining is not performed
2024-12-05 15:16:50.5487751 [I:onnxruntime:, graph_transformer.cc:15 onnxruntime::GraphTransformer::Apply] GraphTransformer EnsureUniqueDQForNodeUnit modified: 0 with status: OK
2024-12-05 15:16:50.5549602 [I:onnxruntime:, graph_transformer.cc:15 onnxruntime::GraphTransformer::Apply] GraphTransformer Level1_RuleBasedTransformer modified: 1 with status: OK
2024-12-05 15:16:50.5752169 [I:onnxruntime:, graph.cc:4288 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs] Removing initializer 'const_fold_opt__3295'. It is no longer used by any node.
...
2024-12-05 15:16:52.7119472 [I:onnxruntime:, transformer_memcpy.cc:322 onnxruntime::TransformerMemcpyImpl::AddCopyNode] Add MemcpyFromHost after StatefulPartitionedCall/generate_detections/while_1_loop:0 for DmlExecutionProvider
2024-12-05 15:16:52.7156664 [I:onnxruntime:, transformer_memcpy.cc:322 onnxruntime::TransformerMemcpyImpl::AddCopyNode] Add MemcpyFromHost after StatefulPartitionedCall/generate_detections/while_1_loop:2 for DmlExecutionProvider
2024-12-05 15:16:52.7199232 [I:onnxruntime:, transformer_memcpy.cc:322 onnxruntime::TransformerMemcpyImpl::AddCopyNode] Add MemcpyFromHost after StatefulPartitionedCall/generate_detections/while_2_loop:0 for DmlExecutionProvider
2024-12-05 15:16:52.7240848 [I:onnxruntime:, transformer_memcpy.cc:322 onnxruntime::TransformerMemcpyImpl::AddCopyNode] Add MemcpyFromHost after StatefulPartitionedCall/generate_detections/while_2_loop:2 for DmlExecutionProvider
2024-12-05 15:16:52.7288577 [I:onnxruntime:, transformer_memcpy.cc:322 onnxruntime::TransformerMemcpyImpl::AddCopyNode] Add MemcpyFromHost after StatefulPartitionedCall/generate_detections/while_3_loop:0 for DmlExecutionProvider
2024-12-05 15:16:52.7329562 [I:onnxruntime:, transformer_memcpy.cc:322 onnxruntime::TransformerMemcpyImpl::AddCopyNode] Add MemcpyFromHost after StatefulPartitionedCall/generate_detections/while_3_loop:2 for DmlExecutionProvider
2024-12-05 15:16:52.7368946 [I:onnxruntime:, transformer_memcpy.cc:322 onnxruntime::TransformerMemcpyImpl::AddCopyNode] Add MemcpyFromHost after StatefulPartitionedCall/generate_detections/while_loop:0 for DmlExecutionProvider
2024-12-05 15:16:52.7412758 [I:onnxruntime:, transformer_memcpy.cc:322 onnxruntime::TransformerMemcpyImpl::AddCopyNode] Add MemcpyFromHost after StatefulPartitionedCall/generate_detections/while_loop:2 for DmlExecutionProvider
2024-12-05 15:16:52.7453307 [I:onnxruntime:, transformer_memcpy.cc:322 onnxruntime::TransformerMemcpyImpl::AddCopyNode] Add MemcpyFromHost after StatefulPartitionedCall/map/while_loop:2 for DmlExecutionProvider
2024-12-05 15:16:52.7490572 [I:onnxruntime:, transformer_memcpy.cc:322 onnxruntime::TransformerMemcpyImpl::AddCopyNode] Add MemcpyFromHost after StatefulPartitionedCall/map/while_loop:3 for DmlExecutionProvider
2024-12-05 15:16:52.7528187 [I:onnxruntime:, transformer_memcpy.cc:322 onnxruntime::TransformerMemcpyImpl::AddCopyNode] Add MemcpyFromHost after StatefulPartitionedCall/map/while_loop:4 for DmlExecutionProvider
2024-12-05 15:16:52.7568982 [I:onnxruntime:, transformer_memcpy.cc:322 onnxruntime::TransformerMemcpyImpl::AddCopyNode] Add MemcpyFromHost after StatefulPartitionedCall/map/while_loop:5 for DmlExecutionProvider
2024-12-05 15:16:52.7605508 [I:onnxruntime:, transformer_memcpy.cc:322 onnxruntime::TransformerMemcpyImpl::AddCopyNode] Add MemcpyFromHost after StatefulPartitionedCall/map/while_loop:6 for DmlExecutionProvider
2024-12-05 15:16:52.7641952 [I:onnxruntime:, transformer_memcpy.cc:322 onnxruntime::TransformerMemcpyImpl::AddCopyNode] Add MemcpyFromHost after StatefulPartitionedCall/map/while_loop:7 for DmlExecutionProvider
2024-12-05 15:16:52.7679200 [I:onnxruntime:, transformer_memcpy.cc:322 onnxruntime::TransformerMemcpyImpl::AddCopyNode] Add MemcpyFromHost after image_info for DmlExecutionProvider
2024-12-05 15:16:52.7712092 [I:onnxruntime:, transformer_memcpy.cc:322 onnxruntime::TransformerMemcpyImpl::AddCopyNode] Add MemcpyToHost before StatefulPartitionedCall/generate_detections/Pad:0 for DmlExecutionProvider
2024-12-05 15:16:52.7756271 [I:onnxruntime:, transformer_memcpy.cc:322 onnxruntime::TransformerMemcpyImpl::AddCopyNode] Add MemcpyToHost before StatefulPartitionedCall/generate_detections/Pad_1:0 for DmlExecutionProvider
2024-12-05 15:16:52.7793218 [I:onnxruntime:, transformer_memcpy.cc:322 onnxruntime::TransformerMemcpyImpl::AddCopyNode] Add MemcpyToHost before StatefulPartitionedCall/generate_detections/Pad_2:0 for DmlExecutionProvider
2024-12-05 15:16:52.7831779 [I:onnxruntime:, transformer_memcpy.cc:322 onnxruntime::TransformerMemcpyImpl::AddCopyNode] Add MemcpyToHost before StatefulPartitionedCall/generate_detections/Pad_3:0 for DmlExecutionProvider
2024-12-05 15:16:52.7885289 [I:onnxruntime:, transformer_memcpy.cc:322 onnxruntime::TransformerMemcpyImpl::AddCopyNode] Add MemcpyFromHost after generate_detections_while_placeholder:0 for DmlExecutionProvider
2024-12-05 15:16:52.7920273 [I:onnxruntime:, transformer_memcpy.cc:322 onnxruntime::TransformerMemcpyImpl::AddCopyNode] Add MemcpyFromHost after generate_detections_while_placeholder_1:0 for DmlExecutionProvider
2024-12-05 15:16:52.7955939 [I:onnxruntime:, transformer_memcpy.cc:322 onnxruntime::TransformerMemcpyImpl::AddCopyNode] Add MemcpyFromHost after generate_detections_while_placeholder_3:0 for DmlExecutionProvider
2024-12-05 15:16:52.7993707 [I:onnxruntime:, transformer_memcpy.cc:322 onnxruntime::TransformerMemcpyImpl::AddCopyNode] Add MemcpyFromHost after generate_detections/while/while_1_loop:0 for DmlExecutionProvider
2024-12-05 15:16:52.8029713 [I:onnxruntime:, transformer_memcpy.cc:322 onnxruntime::TransformerMemcpyImpl::AddCopyNode] Add MemcpyFromHost after generate_detections/while/while_loop:1 for DmlExecutionProvider
2024-12-05 15:16:52.8071067 [I:onnxruntime:, transformer_memcpy.cc:322 onnxruntime::TransformerMemcpyImpl::AddCopyNode] Add MemcpyToHost before Add__519:0 for DmlExecutionProvider
2024-12-05 15:16:52.8101919 [I:onnxruntime:, transformer_memcpy.cc:322 onnxruntime::TransformerMemcpyImpl::AddCopyNode] Add MemcpyToHost before __inference_generate_detections_while_while_cond_171100_45634_generate_detections/while/while/Less:0 for DmlExecutionProvider
2024-12-05 15:16:52.8150966 [I:onnxruntime:, transformer_memcpy.cc:322 onnxruntime::TransformerMemcpyImpl::AddCopyNode] Add MemcpyToHost before generate_detections/while/Slice/begin_Concat__512:0 for DmlExecutionProvider
2024-12-05 15:16:52.8189319 [I:onnxruntime:, transformer_memcpy.cc:322 onnxruntime::TransformerMemcpyImpl::AddCopyNode] Add MemcpyToHost before generate_detections/while/Slice:0 for DmlExecutionProvider
2024-12-05 15:16:52.8231356 [I:onnxruntime:, transformer_memcpy.cc:322 onnxruntime::TransformerMemcpyImpl::AddCopyNode] Add MemcpyToHost before generate_detections/while/Sum:0 for DmlExecutionProvider
2024-12-05 15:16:52.8265563 [I:onnxruntime:, transformer_memcpy.cc:322 onnxruntime::TransformerMemcpyImpl::AddCopyNode] Add MemcpyToHost before generate_detections/while/mul_1:0 for DmlExecutionProvider
2024-12-05 15:16:52.8304020 [I:onnxruntime:, transformer_memcpy.cc:322 onnxruntime::TransformerMemcpyImpl::AddCopyNode] Add MemcpyToHost before Add__157:0 for DmlExecutionProvider
2024-12-05 15:16:52.8333262 [I:onnxruntime:, transformer_memcpy.cc:322 onnxruntime::TransformerMemcpyImpl::AddCopyNode] Add MemcpyToHost before generate_detections/while/while/Slice/begin_Concat__150:0 for DmlExecutionProvider
2024-12-05 15:16:52.8386073 [I:onnxruntime:, transformer_memcpy.cc:322 onnxruntime::TransformerMemcpyImpl::AddCopyNode] Add MemcpyFromHost after generate_detections_while_2_placeholder:0 for DmlExecutionProvider
2024-12-05 15:16:52.8420232 [I:onnxruntime:, transformer_memcpy.cc:322 onnxruntime::TransformerMemcpyImpl::AddCopyNode] Add MemcpyFromHost after generate_detections_while_2_placeholder_1:0 for DmlExecutionProvider
2024-12-05 15:16:52.8457427 [I:onnxruntime:, transformer_memcpy.cc:322 onnxruntime::TransformerMemcpyImpl::AddCopyNode] Add MemcpyFromHost after generate_detections_while_2_placeholder_3:0 for DmlExecutionProvider
2024-12-05 15:16:52.8494566 [I:onnxruntime:, transformer_memcpy.cc:322 onnxruntime::TransformerMemcpyImpl::AddCopyNode] Add MemcpyFromHost after generate_detections/while_2/while_1_loop:0 for DmlExecutionProvider
2024-12-05 15:16:52.8536411 [I:onnxruntime:, transformer_memcpy.cc:322 onnxruntime::TransformerMemcpyImpl::AddCopyNode] Add MemcpyFromHost after generate_detections/while_2/while_loop:1 for DmlExecutionProvider
2024-12-05 15:16:52.8573515 [I:onnxruntime:, transformer_memcpy.cc:322 onnxruntime::TransformerMemcpyImpl::AddCopyNode] Add MemcpyToHost before Add__563:0 for DmlExecutionProvider
2024-12-05 15:16:52.8604409 [I:onnxruntime:, transformer_memcpy.cc:322 onnxruntime::TransformerMemcpyImpl::AddCopyNode] Add MemcpyToHost before __inference_generate_detections_while_2_while_cond_172116_53870_generate_detections/while_2/while/Less:0 for DmlExecutionProvider
2024-12-05 15:16:52.8654399 [I:onnxruntime:, transformer_memcpy.cc:322 onnxruntime::TransformerMemcpyImpl::AddCopyNode] Add MemcpyToHost before generate_detections/while_2/Slice/begin_Concat__556:0 for DmlExecutionProvider
2024-12-05 15:16:52.8698820 [I:onnxruntime:, transformer_memcpy.cc:322 onnxruntime::TransformerMemcpyImpl::AddCopyNode] Add MemcpyToHost before generate_detections/while_2/Slice:0 for DmlExecutionProvider
2024-12-05 15:16:52.8733884 [I:onnxruntime:, transformer_memcpy.cc:322 onnxruntime::TransformerMemcpyImpl::AddCopyNode] Add MemcpyToHost before generate_detections/while_2/Sum:0 for DmlExecutionProvider
2024-12-05 15:16:52.8769575 [I:onnxruntime:, transformer_memcpy.cc:322 onnxruntime::TransformerMemcpyImpl::AddCopyNode] Add MemcpyToHost before generate_detections/while_2/mul_1:0 for DmlExecutionProvider
2024-12-05 15:16:52.8812935 [I:onnxruntime:, transformer_memcpy.cc:322 onnxruntime::TransformerMemcpyImpl::AddCopyNode] Add MemcpyToHost before Add__195:0 for DmlExecutionProvider
2024-12-05 15:16:52.8847887 [I:onnxruntime:, transformer_memcpy.cc:322 onnxruntime::TransformerMemcpyImpl::AddCopyNode] Add MemcpyToHost before generate_detections/while_2/while/Slice/begin_Concat__188:0 for DmlExecutionProvider
2024-12-05 15:16:52.8897462 [I:onnxruntime:, transformer_memcpy.cc:322 onnxruntime::TransformerMemcpyImpl::AddCopyNode] Add MemcpyFromHost after generate_detections_while_3_placeholder:0 for DmlExecutionProvider
2024-12-05 15:16:52.8932178 [I:onnxruntime:, transformer_memcpy.cc:322 onnxruntime::TransformerMemcpyImpl::AddCopyNode] Add MemcpyFromHost after generate_detections_while_3_placeholder_1:0 for DmlExecutionProvider
2024-12-05 15:16:52.8972001 [I:onnxruntime:, transformer_memcpy.cc:322 onnxruntime::TransformerMemcpyImpl::AddCopyNode] Add MemcpyFromHost after generate_detections_while_3_placeholder_3:0 for DmlExecutionProvider
2024-12-05 15:16:52.9015392 [I:onnxruntime:, transformer_memcpy.cc:322 onnxruntime::TransformerMemcpyImpl::AddCopyNode] Add MemcpyFromHost after generate_detections/while_3/while_1_loop:0 for DmlExecutionProvider
2024-12-05 15:16:52.9053076 [I:onnxruntime:, transformer_memcpy.cc:322 onnxruntime::TransformerMemcpyImpl::AddCopyNode] Add MemcpyFromHost after generate_detections/while_3/while_loop:1 for DmlExecutionProvider
2024-12-05 15:16:52.9091887 [I:onnxruntime:, transformer_memcpy.cc:322 onnxruntime::TransformerMemcpyImpl::AddCopyNode] Add MemcpyToHost before Add__431:0 for DmlExecutionProvider
2024-12-05 15:16:52.9124806 [I:onnxruntime:, transformer_memcpy.cc:322 onnxruntime::TransformerMemcpyImpl::AddCopyNode] Add MemcpyToHost before __inference_generate_detections_while_3_while_cond_172624_2628_generate_detections/while_3/while/Less:0 for DmlExecutionProvider
2024-12-05 15:16:52.9182496 [I:onnxruntime:, transformer_memcpy.cc:322 onnxruntime::TransformerMemcpyImpl::AddCopyNode] Add MemcpyToHost before generate_detections/while_3/Slice/begin_Concat__424:0 for DmlExecutionProvider
2024-12-05 15:16:52.9222850 [I:onnxruntime:, transformer_memcpy.cc:322 onnxruntime::TransformerMemcpyImpl::AddCopyNode] Add MemcpyToHost before generate_detections/while_3/Slice:0 for DmlExecutionProvider
2024-12-05 15:16:52.9260183 [I:onnxruntime:, transformer_memcpy.cc:322 onnxruntime::TransformerMemcpyImpl::AddCopyNode] Add MemcpyToHost before generate_detections/while_3/Sum:0 for DmlExecutionProvider
2024-12-05 15:16:52.9297354 [I:onnxruntime:, transformer_memcpy.cc:322 onnxruntime::TransformerMemcpyImpl::AddCopyNode] Add MemcpyToHost before generate_detections/while_3/mul_1:0 for DmlExecutionProvider
2024-12-05 15:16:52.9347622 [I:onnxruntime:, transformer_memcpy.cc:322 onnxruntime::TransformerMemcpyImpl::AddCopyNode] Add MemcpyToHost before Add__333:0 for DmlExecutionProvider
2024-12-05 15:16:52.9377729 [I:onnxruntime:, transformer_memcpy.cc:322 onnxruntime::TransformerMemcpyImpl::AddCopyNode] Add MemcpyToHost before generate_detections/while_3/while/Slice/begin_Concat__326:0 for DmlExecutionProvider
2024-12-05 15:16:52.9428784 [I:onnxruntime:, transformer_memcpy.cc:322 onnxruntime::TransformerMemcpyImpl::AddCopyNode] Add MemcpyFromHost after generate_detections_while_1_placeholder:0 for DmlExecutionProvider
2024-12-05 15:16:52.9469844 [I:onnxruntime:, transformer_memcpy.cc:322 onnxruntime::TransformerMemcpyImpl::AddCopyNode] Add MemcpyFromHost after generate_detections_while_1_placeholder_1:0 for DmlExecutionProvider
2024-12-05 15:16:52.9508237 [I:onnxruntime:, transformer_memcpy.cc:322 onnxruntime::TransformerMemcpyImpl::AddCopyNode] Add MemcpyFromHost after generate_detections_while_1_placeholder_3:0 for DmlExecutionProvider
2024-12-05 15:16:52.9544934 [I:onnxruntime:, transformer_memcpy.cc:322 onnxruntime::TransformerMemcpyImpl::AddCopyNode] Add MemcpyFromHost after generate_detections/while_1/while_1_loop:0 for DmlExecutionProvider
2024-12-05 15:16:52.9581274 [I:onnxruntime:, transformer_memcpy.cc:322 onnxruntime::TransformerMemcpyImpl::AddCopyNode] Add MemcpyFromHost after generate_detections/while_1/while_loop:1 for DmlExecutionProvider
2024-12-05 15:16:52.9617172 [I:onnxruntime:, transformer_memcpy.cc:322 onnxruntime::TransformerMemcpyImpl::AddCopyNode] Add MemcpyToHost before Add__475:0 for DmlExecutionProvider
2024-12-05 15:16:52.9657815 [I:onnxruntime:, transformer_memcpy.cc:322 onnxruntime::TransformerMemcpyImpl::AddCopyNode] Add MemcpyToHost before __inference_generate_detections_while_1_while_cond_171608_11487_generate_detections/while_1/while/Less:0 for DmlExecutionProvider
2024-12-05 15:16:52.9706676 [I:onnxruntime:, transformer_memcpy.cc:322 onnxruntime::TransformerMemcpyImpl::AddCopyNode] Add MemcpyToHost before generate_detections/while_1/Slice/begin_Concat__468:0 for DmlExecutionProvider
2024-12-05 15:16:52.9745162 [I:onnxruntime:, transformer_memcpy.cc:322 onnxruntime::TransformerMemcpyImpl::AddCopyNode] Add MemcpyToHost before generate_detections/while_1/Slice:0 for DmlExecutionProvider
2024-12-05 15:16:52.9787338 [I:onnxruntime:, transformer_memcpy.cc:322 onnxruntime::TransformerMemcpyImpl::AddCopyNode] Add MemcpyToHost before generate_detections/while_1/Sum:0 for DmlExecutionProvider
2024-12-05 15:16:52.9823171 [I:onnxruntime:, transformer_memcpy.cc:322 onnxruntime::TransformerMemcpyImpl::AddCopyNode] Add MemcpyToHost before generate_detections/while_1/mul_1:0 for DmlExecutionProvider
2024-12-05 15:16:52.9864150 [I:onnxruntime:, transformer_memcpy.cc:322 onnxruntime::TransformerMemcpyImpl::AddCopyNode] Add MemcpyToHost before Add__375:0 for DmlExecutionProvider
2024-12-05 15:16:52.9892516 [I:onnxruntime:, transformer_memcpy.cc:322 onnxruntime::TransformerMemcpyImpl::AddCopyNode] Add MemcpyToHost before generate_detections/while_1/while/Slice/begin_Concat__368:0 for DmlExecutionProvider
2024-12-05 15:16:52.9931800 [I:onnxruntime:, graph_transformer.cc:15 onnxruntime::GraphTransformer::Apply] GraphTransformer MemcpyTransformer modified: 1 with status: OK
2024-12-05 15:16:53.0141249 [W:onnxruntime:, session_state.cc:1168 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-12-05 15:16:53.0195616 [W:onnxruntime:, session_state.cc:1170 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
2024-12-05 15:16:53.0289675 [I:onnxruntime:, allocation_planner.cc:2567 onnxruntime::IGraphPartitioner::CreateGraphPartitioner] Use DeviceBasedPartition as default
2024-12-05 15:16:53.1121011 [I:onnxruntime:, session_state_utils.cc:276 onnxruntime::session_state_utils::SaveInitializedTensors] Saving initialized tensors.
2024-12-05 15:16:53.2428538 [I:onnxruntime:, session_state_utils.cc:427 onnxruntime::session_state_utils::SaveInitializedTensors] Done saving initialized tensors
2024-12-05 15:16:53.2739797 [I:onnxruntime:, allocation_planner.cc:2567 onnxruntime::IGraphPartitioner::CreateGraphPartitioner] Use DeviceBasedPartition as default
2024-12-05 15:16:53.2772704 [I:onnxruntime:, session_state_utils.cc:276 onnxruntime::session_state_utils::SaveInitializedTensors] Saving initialized tensors.
2024-12-05 15:16:53.2811078 [I:onnxruntime:, session_state_utils.cc:427 onnxruntime::session_state_utils::SaveInitializedTensors] Done saving initialized tensors
2024-12-05 15:16:53.2853881 [I:onnxruntime:, session_state_utils.cc:546 onnxruntime::session_state_utils::SaveInputOutputNamesToNodeMapping] Subgraph input with name cond__594:0 is not used by any node.
2024-12-05 15:16:53.2895935 [I:onnxruntime:, allocation_planner.cc:2567 onnxruntime::IGraphPartitioner::CreateGraphPartitioner] Use DeviceBasedPartition as default
2024-12-05 15:16:53.2931059 [I:onnxruntime:, session_state_utils.cc:276 onnxruntime::session_state_utils::SaveInitializedTensors] Saving initialized tensors.
2024-12-05 15:16:53.3052349 [I:onnxruntime:, session_state_utils.cc:427 onnxruntime::session_state_utils::SaveInitializedTensors] Done saving initialized tensors
2024-12-05 15:16:53.3104494 [I:onnxruntime:, session_state_utils.cc:546 onnxruntime::session_state_utils::SaveInputOutputNamesToNodeMapping] Subgraph input with name generate_detections_while_generate_detections_while_loop_counter:0 is not used by any node.
2024-12-05 15:16:53.3149239 [I:onnxruntime:, session_state_utils.cc:546 onnxruntime::session_state_utils::SaveInputOutputNamesToNodeMapping] Subgraph input with name cond__1745:0 is not used by any node.
2024-12-05 15:16:53.3184577 [I:onnxruntime:, allocation_planner.cc:2567 onnxruntime::IGraphPartitioner::CreateGraphPartitioner] Use DeviceBasedPartition as default
2024-12-05 15:16:53.3224143 [I:onnxruntime:, session_state_utils.cc:276 onnxruntime::session_state_utils::SaveInitializedTensors] Saving initialized tensors.
2024-12-05 15:16:53.3272936 [I:onnxruntime:, session_state_utils.cc:427 onnxruntime::session_state_utils::SaveInitializedTensors] Done saving initialized tensors
2024-12-05 15:16:53.3328468 [I:onnxruntime:, session_state_utils.cc:546 onnxruntime::session_state_utils::SaveInputOutputNamesToNodeMapping] Subgraph input with name generate_detections_while_while_generate_detections_while_while_loop_counter:0 is not used by any node.
2024-12-05 15:16:53.3380169 [I:onnxruntime:, session_state_utils.cc:546 onnxruntime::session_state_utils::SaveInputOutputNamesToNodeMapping] Subgraph input with name cond__521:0 is not used by any node.
2024-12-05 15:16:53.3415204 [I:onnxruntime:, allocation_planner.cc:2567 onnxruntime::IGraphPartitioner::CreateGraphPartitioner] Use DeviceBasedPartition as default
2024-12-05 15:16:53.3444767 [I:onnxruntime:, session_state_utils.cc:276 onnxruntime::session_state_utils::SaveInitializedTensors] Saving initialized tensors.
2024-12-05 15:16:53.3480699 [I:onnxruntime:, session_state_utils.cc:427 onnxruntime::session_state_utils::SaveInitializedTensors] Done saving initialized tensors
2024-12-05 15:16:53.3525591 [I:onnxruntime:, session_state_utils.cc:546 onnxruntime::session_state_utils::SaveInputOutputNamesToNodeMapping] Subgraph input with name generate_detections_while_while_1_generate_detections_while_while_1_loop_counter:0 is not used by any node.
2024-12-05 15:16:53.3573492 [I:onnxruntime:, session_state_utils.cc:546 onnxruntime::session_state_utils::SaveInputOutputNamesToNodeMapping] Subgraph input with name cond__534:0 is not used by any node.
2024-12-05 15:16:53.3608484 [I:onnxruntime:, session_state_utils.cc:546 onnxruntime::session_state_utils::SaveInputOutputNamesToNodeMapping] Subgraph input with name generate_detections_while_while_1_placeholder_1:0 is not used by any node.
2024-12-05 15:16:53.3655909 [I:onnxruntime:, allocation_planner.cc:2567 onnxruntime::IGraphPartitioner::CreateGraphPartitioner] Use DeviceBasedPartition as default
2024-12-05 15:16:53.3696751 [I:onnxruntime:, session_state_utils.cc:276 onnxruntime::session_state_utils::SaveInitializedTensors] Saving initialized tensors.
2024-12-05 15:16:53.3808391 [I:onnxruntime:, session_state_utils.cc:427 onnxruntime::session_state_utils::SaveInitializedTensors] Done saving initialized tensors
2024-12-05 15:16:53.3857688 [I:onnxruntime:, session_state_utils.cc:546 onnxruntime::session_state_utils::SaveInputOutputNamesToNodeMapping] Subgraph input with name generate_detections_while_2_generate_detections_while_2_loop_counter:0 is not used by any node.
2024-12-05 15:16:53.3903926 [I:onnxruntime:, session_state_utils.cc:546 onnxruntime::session_state_utils::SaveInputOutputNamesToNodeMapping] Subgraph input with name cond__1714:0 is not used by any node.
2024-12-05 15:16:53.3939417 [I:onnxruntime:, allocation_planner.cc:2567 onnxruntime::IGraphPartitioner::CreateGraphPartitioner] Use DeviceBasedPartition as default
2024-12-05 15:16:53.3975182 [I:onnxruntime:, session_state_utils.cc:276 onnxruntime::session_state_utils::SaveInitializedTensors] Saving initialized tensors.
2024-12-05 15:16:53.4034818 [I:onnxruntime:, session_state_utils.cc:427 onnxruntime::session_state_utils::SaveInitializedTensors] Done saving initialized tensors
2024-12-05 15:16:53.4091794 [I:onnxruntime:, session_state_utils.cc:546 onnxruntime::session_state_utils::SaveInputOutputNamesToNodeMapping] Subgraph input with name generate_detections_while_2_while_generate_detections_while_2_while_loop_counter:0 is not used by any node.
2024-12-05 15:16:53.4138265 [I:onnxruntime:, session_state_utils.cc:546 onnxruntime::session_state_utils::SaveInputOutputNamesToNodeMapping] Subgraph input with name cond__565:0 is not used by any node.
2024-12-05 15:16:53.4179456 [I:onnxruntime:, allocation_planner.cc:2567 onnxruntime::IGraphPartitioner::CreateGraphPartitioner] Use DeviceBasedPartition as default
2024-12-05 15:16:53.4208732 [I:onnxruntime:, session_state_utils.cc:276 onnxruntime::session_state_utils::SaveInitializedTensors] Saving initialized tensors.
2024-12-05 15:16:53.4246109 [I:onnxruntime:, session_state_utils.cc:427 onnxruntime::session_state_utils::SaveInitializedTensors] Done saving initialized tensors
2024-12-05 15:16:53.4289276 [I:onnxruntime:, session_state_utils.cc:546 onnxruntime::session_state_utils::SaveInputOutputNamesToNodeMapping] Subgraph input with name generate_detections_while_2_while_1_generate_detections_while_2_while_1_loop_counter:0 is not used by any node.
2024-12-05 15:16:53.4345519 [I:onnxruntime:, session_state_utils.cc:546 onnxruntime::session_state_utils::SaveInputOutputNamesToNodeMapping] Subgraph input with name cond__578:0 is not used by any node.
2024-12-05 15:16:53.4382117 [I:onnxruntime:, session_state_utils.cc:546 onnxruntime::session_state_utils::SaveInputOutputNamesToNodeMapping] Subgraph input with name generate_detections_while_2_while_1_placeholder_1:0 is not used by any node.
2024-12-05 15:16:53.4432255 [I:onnxruntime:, allocation_planner.cc:2567 onnxruntime::IGraphPartitioner::CreateGraphPartitioner] Use DeviceBasedPartition as default
2024-12-05 15:16:53.4467179 [I:onnxruntime:, session_state_utils.cc:276 onnxruntime::session_state_utils::SaveInitializedTensors] Saving initialized tensors.
2024-12-05 15:16:53.4585570 [I:onnxruntime:, session_state_utils.cc:427 onnxruntime::session_state_utils::SaveInitializedTensors] Done saving initialized tensors
2024-12-05 15:16:53.4630514 [I:onnxruntime:, session_state_utils.cc:546 onnxruntime::session_state_utils::SaveInputOutputNamesToNodeMapping] Subgraph input with name generate_detections_while_3_generate_detections_while_3_loop_counter:0 is not used by any node.
2024-12-05 15:16:53.4680000 [I:onnxruntime:, session_state_utils.cc:546 onnxruntime::session_state_utils::SaveInputOutputNamesToNodeMapping] Subgraph input with name cond__1683:0 is not used by any node.
2024-12-05 15:16:53.4717996 [I:onnxruntime:, allocation_planner.cc:2567 onnxruntime::IGraphPartitioner::CreateGraphPartitioner] Use DeviceBasedPartition as default
2024-12-05 15:16:53.4758253 [I:onnxruntime:, session_state_utils.cc:276 onnxruntime::session_state_utils::SaveInitializedTensors] Saving initialized tensors.
2024-12-05 15:16:53.4816721 [I:onnxruntime:, session_state_utils.cc:427 onnxruntime::session_state_utils::SaveInitializedTensors] Done saving initialized tensors
2024-12-05 15:16:53.4874794 [I:onnxruntime:, session_state_utils.cc:546 onnxruntime::session_state_utils::SaveInputOutputNamesToNodeMapping] Subgraph input with name generate_detections_while_3_while_generate_detections_while_3_while_loop_counter:0 is not used by any node.
2024-12-05 15:16:53.4924787 [I:onnxruntime:, session_state_utils.cc:546 onnxruntime::session_state_utils::SaveInputOutputNamesToNodeMapping] Subgraph input with name cond__433:0 is not used by any node.
2024-12-05 15:16:53.4970818 [I:onnxruntime:, allocation_planner.cc:2567 onnxruntime::IGraphPartitioner::CreateGraphPartitioner] Use DeviceBasedPartition as default
2024-12-05 15:16:53.5001705 [I:onnxruntime:, session_state_utils.cc:276 onnxruntime::session_state_utils::SaveInitializedTensors] Saving initialized tensors.
2024-12-05 15:16:53.5040750 [I:onnxruntime:, session_state_utils.cc:427 onnxruntime::session_state_utils::SaveInitializedTensors] Done saving initialized tensors
2024-12-05 15:16:53.5085078 [I:onnxruntime:, session_state_utils.cc:546 onnxruntime::session_state_utils::SaveInputOutputNamesToNodeMapping] Subgraph input with name generate_detections_while_3_while_1_generate_detections_while_3_while_1_loop_counter:0 is not used by any node.
2024-12-05 15:16:53.5143306 [I:onnxruntime:, session_state_utils.cc:546 onnxruntime::session_state_utils::SaveInputOutputNamesToNodeMapping] Subgraph input with name cond__446:0 is not used by any node.
2024-12-05 15:16:53.5180634 [I:onnxruntime:, session_state_utils.cc:546 onnxruntime::session_state_utils::SaveInputOutputNamesToNodeMapping] Subgraph input with name generate_detections_while_3_while_1_placeholder_1:0 is not used by any node.
2024-12-05 15:16:53.5230883 [I:onnxruntime:, allocation_planner.cc:2567 onnxruntime::IGraphPartitioner::CreateGraphPartitioner] Use DeviceBasedPartition as default
2024-12-05 15:16:53.5278779 [I:onnxruntime:, session_state_utils.cc:276 onnxruntime::session_state_utils::SaveInitializedTensors] Saving initialized tensors.
2024-12-05 15:16:53.5386613 [I:onnxruntime:, session_state_utils.cc:427 onnxruntime::session_state_utils::SaveInitializedTensors] Done saving initialized tensors
2024-12-05 15:16:53.5438737 [I:onnxruntime:, session_state_utils.cc:546 onnxruntime::session_state_utils::SaveInputOutputNamesToNodeMapping] Subgraph input with name generate_detections_while_1_generate_detections_while_1_loop_counter:0 is not used by any node.
2024-12-05 15:16:53.5490591 [I:onnxruntime:, session_state_utils.cc:546 onnxruntime::session_state_utils::SaveInputOutputNamesToNodeMapping] Subgraph input with name cond__1652:0 is not used by any node.
2024-12-05 15:16:53.5532416 [I:onnxruntime:, allocation_planner.cc:2567 onnxruntime::IGraphPartitioner::CreateGraphPartitioner] Use DeviceBasedPartition as default
2024-12-05 15:16:53.5568735 [I:onnxruntime:, session_state_utils.cc:276 onnxruntime::session_state_utils::SaveInitializedTensors] Saving initialized tensors.
2024-12-05 15:16:53.5624692 [I:onnxruntime:, session_state_utils.cc:427 onnxruntime::session_state_utils::SaveInitializedTensors] Done saving initialized tensors
2024-12-05 15:16:53.5681418 [I:onnxruntime:, session_state_utils.cc:546 onnxruntime::session_state_utils::SaveInputOutputNamesToNodeMapping] Subgraph input with name generate_detections_while_1_while_generate_detections_while_1_while_loop_counter:0 is not used by any node.
2024-12-05 15:16:53.5738492 [I:onnxruntime:, session_state_utils.cc:546 onnxruntime::session_state_utils::SaveInputOutputNamesToNodeMapping] Subgraph input with name cond__477:0 is not used by any node.
2024-12-05 15:16:53.5777121 [I:onnxruntime:, allocation_planner.cc:2567 onnxruntime::IGraphPartitioner::CreateGraphPartitioner] Use DeviceBasedPartition as default
2024-12-05 15:16:53.5809621 [I:onnxruntime:, session_state_utils.cc:276 onnxruntime::session_state_utils::SaveInitializedTensors] Saving initialized tensors.
2024-12-05 15:16:53.5849547 [I:onnxruntime:, session_state_utils.cc:427 onnxruntime::session_state_utils::SaveInitializedTensors] Done saving initialized tensors
2024-12-05 15:16:53.5893835 [I:onnxruntime:, session_state_utils.cc:546 onnxruntime::session_state_utils::SaveInputOutputNamesToNodeMapping] Subgraph input with name generate_detections_while_1_while_1_generate_detections_while_1_while_1_loop_counter:0 is not used by any node.
2024-12-05 15:16:53.5946188 [I:onnxruntime:, session_state_utils.cc:546 onnxruntime::session_state_utils::SaveInputOutputNamesToNodeMapping] Subgraph input with name cond__490:0 is not used by any node.
2024-12-05 15:16:53.5985357 [I:onnxruntime:, session_state_utils.cc:546 onnxruntime::session_state_utils::SaveInputOutputNamesToNodeMapping] Subgraph input with name generate_detections_while_1_while_1_placeholder_1:0 is not used by any node.
2024-12-05 15:16:53.6034668 [I:onnxruntime:, inference_session.cc:2141 onnxruntime::InferenceSession::Initialize] Session successfully initialized.

Looks like DmlExecutionProvider was enable but the lib fallback to CPU execution provider

2024-12-05 15:16:50.5311172 [I:onnxruntime:, inference_session.cc:778 onnxruntime::InferenceSession::RegisterExecutionProvider] Having memory pattern enabled is not supported while using DmlExecutionProvider. So disabling it for this session since it uses DmlExecutionProvider.
2024-12-05 15:16:50.5364908 [I:onnxruntime:, inference_session.cc:1699 onnxruntime::InferenceSession::Initialize] Initializing session.
2024-12-05 15:16:50.5393459 [I:onnxruntime:, inference_session.cc:1736 onnxruntime::InferenceSession::Initialize] Adding default CPU execution provider.

What could be wrong here?

Visual Studio Version

No response

GCC / Compiler Version

No response

@gutembergsa gutembergsa added the build build issues; typically submitted using template label Dec 5, 2024
@github-actions github-actions bot added api:Javascript issues related to the Javascript API ep:DML issues related to the DirectML execution provider labels Dec 5, 2024
@fdwr
Copy link
Contributor

fdwr commented Dec 5, 2024

Can't run dml with nodejs after building
...
Some nodes were not assigned to the preferred execution providers
...
Session successfully initialized.

I see it running, but that some nodes fall back to the CPU. Note that across the entire ONNX operator set, there will almost always be some operators that fall back from a primary EP to the CPU EP (as the CPU is the catch-all EP, and some EP's have requirements that make it impossible to support all operators, such as those that dynamically allocate variable sized output during execution based on the input tensor values, like NonZero). The operator kernel coverage is found here: https://github.com/microsoft/onnxruntime/blob/main/docs/OperatorKernels.md#dmlexecutionprovider

Just a hunch looking at the name __inference_generate_detections_while_while_cond_171100_45634_generate_detections, it appears the DML EP doesn't get selected within the loop. The DML EP doesn't explicitly support conditional operators like if/scan/loop, and I'm not certain whether the CPU EP ferries those subnodes to it either. Is it possible to flatten those loops in the ONNX model?

@gutembergsa
Copy link
Author

Well tbh i dont know how to do that, I just started with Tensorflow these days, this model is an object detection model created with the basic example from TF docs: https://www.tensorflow.org/tfmodels/vision/object_detection

Then converted to onnx with :
!python -m tf2onnx.convert --saved-model ./exported_model/ --output ./model.onnx --opset 14 --verbose

But thanks for the hint. I will take a look at what could be done! Also, after more tests, I noticed that DML is working but is making the performance worse than CPU usage.

@fdwr
Copy link
Contributor

fdwr commented Dec 6, 2024

I noticed that DML is working but is making the performance worse than CPU usage.

That implies heterogenous execution with memory copies between devices. If the graph ends up with multiple partitions, then it is actually slower bouncing back and forth between GPU<->CPU<->GPU<->CPU<->GPU... (hence the warning above).

Skimming https://www.tensorflow.org/tfmodels/vision/object_detection, I see it uses ResNet50 as a backbone. Is there an existing ONNX model in the model zoo that works for your needs, like this ResNet50 file? Does it exhibit the perf characteristics you expect?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api:Javascript issues related to the Javascript API build build issues; typically submitted using template ep:DML issues related to the DirectML execution provider
Projects
None yet
Development

No branches or pull requests

2 participants