Should the module be unloaded from VRAM after its use? #325

martindellavecchia · 2024-10-07T15:46:15Z

Which OS are you using?

OS: Ubuntu 24.04
Standalone Linux Install

I've noticed that after running a transcription the model remains on VRAM making impossible to do another transcription with a different model as there's not enough vram. Is there any way to offload the model after certain period of incatvity?

Thanks.

jhj0517 · 2024-10-09T03:46:16Z

Hi. If you're able to run large models, you should be able to use the other Whisper models as you like in the web ui.

Because the expected behavior when changing the Whisper model is to update the current model to it, not load it additionally.

But if you tried to run Music removal model together while transcribing, you might get CUDA errors if you have <12GB VRAM.

martindellavecchia · 2024-10-09T13:04:16Z

VRAM wise I should be OK, I have 12GB (3060), other AI stuff is running on another GPU.

I noticed that other model managers such ollama offload the models after certain time of not being used, or even they unload them when the user want to select a different model.

I.e. If i try a transcription with large-v2 and I don't like the result and I want to try large-v3, I need to shutdown the webui to offload the large-v2 model, as it's always in memory.

jhj0517 · 2024-10-09T14:29:29Z

I.e. If i try a transcription with large-v2 and I don't like the result and I want to try large-v3, I need to shutdown the webui to offload the large-v2 model, as it's always in memory.

This is weird and not expected behavior. If you're able to run large-v2, you should be able to run large-v3 by simply changing the model.

If each different model runs fully on a different GPU, this should not happen. Probably something is wrong with the setup, but I don't have multiple GPUs so I can't reproduce/test about it.

martindellavecchia · 2024-10-09T14:34:32Z

Not exactly sure what is it, after the transcription finish, using large-v3, or any other model, there's a remaining processes in the gpu:

+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 79020 C python3.10 6384MiB |

This is the python3.10 using to run the webui

it's like it never offload the model competely from vram.

martindellavecchia added the bug Something isn't working label Oct 7, 2024

martindellavecchia assigned jhj0517 Oct 7, 2024

jhj0517 added enhancement New feature or request bug Something isn't working and removed bug Something isn't working enhancement New feature or request labels Oct 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should the module be unloaded from VRAM after its use? #325

Should the module be unloaded from VRAM after its use? #325

martindellavecchia commented Oct 7, 2024

jhj0517 commented Oct 9, 2024 •

edited

Loading

martindellavecchia commented Oct 9, 2024

jhj0517 commented Oct 9, 2024

martindellavecchia commented Oct 9, 2024

Should the module be unloaded from VRAM after its use? #325

Should the module be unloaded from VRAM after its use? #325

Comments

martindellavecchia commented Oct 7, 2024

jhj0517 commented Oct 9, 2024 • edited Loading

martindellavecchia commented Oct 9, 2024

jhj0517 commented Oct 9, 2024

martindellavecchia commented Oct 9, 2024

jhj0517 commented Oct 9, 2024 •

edited

Loading