Run all Nodes on GPU/DML with DML-EP #21013
Labels
ep:DML
issues related to the DirectML execution provider
feature request
request for unsupported feature or enhancement
model:transformer
issues related to a transformer model: BERT, GPT2, Hugging Face, Longformer, T5, etc.
platform:windows
issues related to the Windows platform
Describe the feature request
I tried to run optimum models with DML EP (on my windows PC), for example take optimum/vit-base-patch16-224 · Hugging Face
model = ORTModelForImageClassification.from_pretrained(model_name, provider=“DmlExecutionProvider”)
onnx 1.16.1
onnxruntime 1.18.0
onnxruntime-directml 1.18.0
optimum 1.20.0
I see nodes are distributed between CPU EP & DML EP. Also, noticed different instances of same node are placed on both DML and CPU.
from verbose logs
2024-06-05 11:11:22.1833502 [V:onnxruntime:, session_state.cc:1152 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Node(s) placed on [DmlExecutionProvider]. Number of nodes: 335
2024-06-05 11:11:22.2061078 [V:onnxruntime:, session_state.cc:1154 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Concat (Concat_25)
2024-06-05 11:11:22.8286509 [V:onnxruntime:, session_state.cc:1152 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Node(s) placed on [CPUExecutionProvider]. Number of nodes: 9
2024-06-05 11:11:22.8322004 [V:onnxruntime:, session_state.cc:1154 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Concat (Concat_7)
For example take “Concat ” node/operator, I believe this node is supported on DML(Concat_25 - is placed on DML), then why Concat_7 instance of this node is placed on CPU
Why the few node instances are placed on CPU, even though DML have support for those nodes?
Here I mentioned Concat node as an example, in the full log I'm seeing the same behavior with other nodes Gather, Squeeze, Unsqueeze etc...
I expect, with provider=“DmlExecutionProvider” option, all nodes should be placed on DML only (exception - if there is no native support on DML for a particular node). But in the above case, all the nodes placed on CPU, support is present on DML
How can I force all nodes to be placed on DML? If the nodes got distributed b/w CPU and DML, I expect some overhead due to data transfer b/w CPU and DML
Thanks,
Describe scenario use case
Trying the run the hugging face optimum model on GPU/DML with all noes on DML
The text was updated successfully, but these errors were encountered: