CUDAExecutionProvider doesn't seem to be used during inference of transformers exported model to ONNX runtime GPU #22325
Labels
ep:CUDA
issues related to the CUDA execution provider
model:transformer
issues related to a transformer model: BERT, GPT2, Hugging Face, Longformer, T5, etc.
stale
issues that have not been addressed in a while; categorized by a bot
Describe the issue
We are seeing an issue with a Transformer model which was exported using torch.onnx.export and then optimized with optimum ORTOptimizer. Inferencing seems to not be using GPU and only CPU.
Model was exported on CPU machine using ONNX 1.16.0. We see the following logs when starting the inference session.
The text was updated successfully, but these errors were encountered: