Run all Nodes on GPU/DML with DML-EP #21013

Jose17-ml · 2024-06-12T09:40:48Z

Describe the feature request

I tried to run optimum models with DML EP (on my windows PC), for example take optimum/vit-base-patch16-224 · Hugging Face

model = ORTModelForImageClassification.from_pretrained(model_name, provider=“DmlExecutionProvider”)

onnx 1.16.1
onnxruntime 1.18.0
onnxruntime-directml 1.18.0
optimum 1.20.0

I see nodes are distributed between CPU EP & DML EP. Also, noticed different instances of same node are placed on both DML and CPU.

from verbose logs

2024-06-05 11:11:22.1833502 [V:onnxruntime:, session_state.cc:1152 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Node(s) placed on [DmlExecutionProvider]. Number of nodes: 335

2024-06-05 11:11:22.2061078 [V:onnxruntime:, session_state.cc:1154 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Concat (Concat_25)

2024-06-05 11:11:22.8286509 [V:onnxruntime:, session_state.cc:1152 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Node(s) placed on [CPUExecutionProvider]. Number of nodes: 9
2024-06-05 11:11:22.8322004 [V:onnxruntime:, session_state.cc:1154 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Concat (Concat_7)

For example take “Concat ” node/operator, I believe this node is supported on DML(Concat_25 - is placed on DML), then why Concat_7 instance of this node is placed on CPU

Why the few node instances are placed on CPU, even though DML have support for those nodes?

Here I mentioned Concat node as an example, in the full log I'm seeing the same behavior with other nodes Gather, Squeeze, Unsqueeze etc...

I expect, with provider=“DmlExecutionProvider” option, all nodes should be placed on DML only (exception - if there is no native support on DML for a particular node). But in the above case, all the nodes placed on CPU, support is present on DML

How can I force all nodes to be placed on DML? If the nodes got distributed b/w CPU and DML, I expect some overhead due to data transfer b/w CPU and DML

Thanks,

Describe scenario use case

Trying the run the hugging face optimum model on GPU/DML with all noes on DML

sophies927 · 2024-06-13T21:17:54Z

@smk2007 can you take a look?

Jose17-ml · 2024-06-18T07:25:34Z

Hi @smk2007, did you get a chance to look into the above issue?

Jose17-ml · 2024-06-24T01:15:39Z

Hi,

Can someone please check this and update?

Thanks

Jose17-ml added the feature request request for unsupported feature or enhancement label Jun 12, 2024

github-actions bot added ep:DML issues related to the DirectML execution provider model:transformer issues related to a transformer model: BERT, GPT2, Hugging Face, Longformer, T5, etc. platform:windows issues related to the Windows platform labels Jun 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Run all Nodes on GPU/DML with DML-EP #21013

Run all Nodes on GPU/DML with DML-EP #21013

Jose17-ml commented Jun 12, 2024 •

edited

Loading

sophies927 commented Jun 13, 2024

Jose17-ml commented Jun 18, 2024

Jose17-ml commented Jun 24, 2024

Run all Nodes on GPU/DML with DML-EP #21013

Run all Nodes on GPU/DML with DML-EP #21013

Comments

Jose17-ml commented Jun 12, 2024 • edited Loading

Describe the feature request

Describe scenario use case

sophies927 commented Jun 13, 2024

Jose17-ml commented Jun 18, 2024

Jose17-ml commented Jun 24, 2024

Jose17-ml commented Jun 12, 2024 •

edited

Loading