[CUDA][Performance] Inference time greatly variates during session run #21966

roxanacincan · 2024-09-03T13:09:21Z

Describe the issue

During session run with CUDAExecutionProvider i noticed that the inference time has great variations based on whether or not i use other application on my computer.

For example, when i run my code without any other programs running inference time is around 490ms, but if i open for example a chatting app and i start writing a message the inference will jump from 490ms up to 1600ms. If i stop using the other app and go back to my code inference time returns to normal values.

This is shown in the log below where i recorded inference time for each iteration, as i use my model inside a loop:
GPU_2070_inference_time.txt

I also recorded the memory usage during runtime to check if it increases when i use other apps, but as you can see below there are no signs of memory leaks:

Test was done on NVIDIA RTX 2070 series , with 8GB GPU memory and 16GB RAM.

Has anyone encountered this type of behavior before? Is there a way to solve this?
I don't think that it's normal for the inference time to be this much affected by a simple task as writing a message.

To reproduce

Sadly, I'm not allowed to share the code or model as this is for a work project.

Urgency

Important

Platform

Windows

OS Version

10

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.15.0 but i also tested 1.13.1 and 1.16.0, and other versions

ONNX Runtime API

C++

Architecture

X64

Execution Provider

CUDA

Execution Provider Library Version

11.8

Model File

No response

Is this a quantized model?

No

The text was updated successfully, but these errors were encountered:

github-actions · 2024-10-04T15:00:53Z

This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

roxanacincan added the performance issues related to performance regressions label Sep 3, 2024

roxanacincan changed the title ~~[Performance] Inference time greatly variates during session run~~ [CUDA][Performance] Inference time greatly variates during session run Sep 4, 2024

github-actions bot added the ep:CUDA issues related to the CUDA execution provider label Sep 4, 2024

github-actions bot added the stale issues that have not been addressed in a while; categorized by a bot label Oct 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CUDA][Performance] Inference time greatly variates during session run #21966

[CUDA][Performance] Inference time greatly variates during session run #21966

roxanacincan commented Sep 3, 2024

github-actions bot commented Oct 4, 2024

[CUDA][Performance] Inference time greatly variates during session run #21966

[CUDA][Performance] Inference time greatly variates during session run #21966

Comments

roxanacincan commented Sep 3, 2024

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

Model File

Is this a quantized model?

github-actions bot commented Oct 4, 2024