Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should the module be unloaded from VRAM after its use? #325

Open
martindellavecchia opened this issue Oct 7, 2024 · 4 comments
Open

Should the module be unloaded from VRAM after its use? #325

martindellavecchia opened this issue Oct 7, 2024 · 4 comments
Assignees
Labels
bug Something isn't working

Comments

@martindellavecchia
Copy link

Which OS are you using?

  • OS: Ubuntu 24.04
  • Standalone Linux Install

I've noticed that after running a transcription the model remains on VRAM making impossible to do another transcription with a different model as there's not enough vram. Is there any way to offload the model after certain period of incatvity?

Thanks.

@martindellavecchia martindellavecchia added the bug Something isn't working label Oct 7, 2024
@jhj0517
Copy link
Owner

jhj0517 commented Oct 9, 2024

Hi. If you're able to run large models, you should be able to use the other Whisper models as you like in the web ui.

Because the expected behavior when changing the Whisper model is to update the current model to it, not load it additionally.

But if you tried to run Music removal model together while transcribing, you might get CUDA errors if you have <12GB VRAM.

@martindellavecchia
Copy link
Author

VRAM wise I should be OK, I have 12GB (3060), other AI stuff is running on another GPU.

I noticed that other model managers such ollama offload the models after certain time of not being used, or even they unload them when the user want to select a different model.

I.e. If i try a transcription with large-v2 and I don't like the result and I want to try large-v3, I need to shutdown the webui to offload the large-v2 model, as it's always in memory.

@jhj0517 jhj0517 added enhancement New feature or request bug Something isn't working and removed bug Something isn't working enhancement New feature or request labels Oct 9, 2024
@jhj0517
Copy link
Owner

jhj0517 commented Oct 9, 2024

I.e. If i try a transcription with large-v2 and I don't like the result and I want to try large-v3, I need to shutdown the webui to offload the large-v2 model, as it's always in memory.

This is weird and not expected behavior. If you're able to run large-v2, you should be able to run large-v3 by simply changing the model.

If each different model runs fully on a different GPU, this should not happen. Probably something is wrong with the setup, but I don't have multiple GPUs so I can't reproduce/test about it.

@martindellavecchia
Copy link
Author

Not exactly sure what is it, after the transcription finish, using large-v3, or any other model, there's a remaining processes in the gpu:

Wed Oct 9 11:30:47 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.35.03 Driver Version: 560.35.03 CUDA Version: 12.6 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 3060 Off | 00000000:01:00.0 Off | N/A |
| 0% 40C P8 13W / 170W | 6394MiB / 12288MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 79020 C python3.10 6384MiB |

This is the python3.10 using to run the webui

it's like it never offload the model competely from vram.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants