Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CUDA][Performance] Inference time greatly variates during session run #21966

Open
roxanacincan opened this issue Sep 3, 2024 · 1 comment
Open
Labels
ep:CUDA issues related to the CUDA execution provider performance issues related to performance regressions stale issues that have not been addressed in a while; categorized by a bot

Comments

@roxanacincan
Copy link

Describe the issue

During session run with CUDAExecutionProvider i noticed that the inference time has great variations based on whether or not i use other application on my computer.

For example, when i run my code without any other programs running inference time is around 490ms, but if i open for example a chatting app and i start writing a message the inference will jump from 490ms up to 1600ms. If i stop using the other app and go back to my code inference time returns to normal values.

This is shown in the log below where i recorded inference time for each iteration, as i use my model inside a loop:
GPU_2070_inference_time.txt

I also recorded the memory usage during runtime to check if it increases when i use other apps, but as you can see below there are no signs of memory leaks:
memory_usage
Test was done on NVIDIA RTX 2070 series , with 8GB GPU memory and 16GB RAM.

Has anyone encountered this type of behavior before? Is there a way to solve this?
I don't think that it's normal for the inference time to be this much affected by a simple task as writing a message.

To reproduce

Sadly, I'm not allowed to share the code or model as this is for a work project.

Urgency

Important

Platform

Windows

OS Version

10

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.15.0 but i also tested 1.13.1 and 1.16.0, and other versions

ONNX Runtime API

C++

Architecture

X64

Execution Provider

CUDA

Execution Provider Library Version

11.8

Model File

No response

Is this a quantized model?

No

@roxanacincan roxanacincan added the performance issues related to performance regressions label Sep 3, 2024
@roxanacincan roxanacincan changed the title [Performance] Inference time greatly variates during session run [CUDA][Performance] Inference time greatly variates during session run Sep 4, 2024
@github-actions github-actions bot added the ep:CUDA issues related to the CUDA execution provider label Sep 4, 2024
Copy link
Contributor

github-actions bot commented Oct 4, 2024

This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

@github-actions github-actions bot added the stale issues that have not been addressed in a while; categorized by a bot label Oct 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ep:CUDA issues related to the CUDA execution provider performance issues related to performance regressions stale issues that have not been addressed in a while; categorized by a bot
Projects
None yet
Development

No branches or pull requests

2 participants
@roxanacincan and others