Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Can get stuck during musical sections. #170

Closed
MCCMikey opened this issue Jul 10, 2024 · 2 comments
Closed

[Bug]: Can get stuck during musical sections. #170

MCCMikey opened this issue Jul 10, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@MCCMikey
Copy link

What happened?

I supplied it a two hour recording of a radio program.

For about 30 minutes of the recording it repeatedly output the line [Music] rather than transcribing the spoken words between tracks.

From about the 40 minute mark it resumed normal output, except for one point where it repeated the line

They're only living in a world where they can say goodbye.

about 40 times.

I have to say though that it does a remarkably good job with such varied content. If this can be fixed I plan to ask this program to periodically verify that our presenters are playing the sponsor messages that they are meant to on our community radio station. I'm definitely going to pay for this app via the support link. I've been looking for something like this for ages.

2020-09-24 Transcription.docx

Steps to reproduce

Feed it the audio file https://drive.google.com/file/d/1nqtLWZTUEJjvVWGDUJRQTRdyJNGbCESM/view?usp=sharing and ask it to transcribe.

What OS are you seeing the problem on?

Window

Relevant log output

No response

@MCCMikey MCCMikey added the bug Something isn't working label Jul 10, 2024
@thewh1teagle
Copy link
Owner

thewh1teagle commented Jul 11, 2024

For about 30 minutes of the recording it repeatedly output the line [Music] rather than transcribing the spoken words between tracks.

I understand that you've encountered challenges transcribing audio with music and background noise. Unfortunately, the Whisper AI model isn't the best fit for this task, as discussed here.

I propose combining a VAD AI model (Voice Activity Detector) with a denoiser model (for noise filtering and speech enhancement). I'd love to hear what other developers think about this approach—please feel free to share your thoughts.

It's worth noting that this isn't a simple task, and I don't believe there's an existing solution for this worldwide, at least not in the non-commercial realm.

For the VAD, we can utilize the Silero VAD model with Sherpa-rs, and for speech enhancement, we can leverage DeepFilterNet

I have to say though that it does a remarkably good job with such varied content. If this can be fixed I plan to ask this program > to periodically verify that our presenters are playing the sponsor messages that they are meant to on our community radio station. I'm definitely going to pay for this app via the support link. I've been looking for something like this for ages.

I'm glad you liked it! and thank you very much for your support in improving Vibe it's greatly appreciated!

@thewh1teagle
Copy link
Owner

Closing as duplicate of #402

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants