issues with longer audio files and from_pretrained() #1562
benniekiss
started this conversation in
General
Replies: 1 comment 3 replies
-
Using a |
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I've been noticing that with audio files over 5 hours, about halfway through processing, the CPU and GPU usage drops off, but the pipeline still runs for about an equal length of time. For example, for 3 hours of audio, the processing will be about 2 minutes on the GPU, then 2 minutes of low cpu and no gpu usage before the result is returned. I was wondering if there was a specific model in the pipeline that doesn't utilize the GPU, or if there was another step that was the cause for this processing bottleneck.
Another issue is that for audio longer than 8 hours, the process crashes in my jupyter notebook running on a RTX 4090 through runpod. The same dynamic happens--about halfway through (about 4 minutes), the CPU and GPU usage drop off, and then the process goes for another 4 to 6 minutes before crashing. Is there a limit to how much pyannote can process at once? splitting the audio would be trivial, but I was curious about pyannote's limits and wanted to avoid further dividing the audio if possible, especially to retain speaker_ids.
I've had great success with audio less than 8 hours.
Beta Was this translation helpful? Give feedback.
All reactions