-
Notifications
You must be signed in to change notification settings - Fork 973
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Subtitles generated for a 1.5 hour long video, the timeline is inaccurate #955
Comments
In addition, buzz performs well when transcribing short videos. |
Some ideas that may help in short term are here #946 Work on longer term solution is in progress |
Ok, Thanks for your reply and advice. |
have done? |
There is some progress in integrating |
@guangxuanliu @ShakeWeLy There is a little update with partial fix for the problem. The very latest development version from here https://github.com/chidiwilliams/buzz/actions/workflows/ci.yml?query=branch%3Amain (log into to the github, select the latest build and scroll down to the artifacts section to get the installation files) This version adds ability to generate the subtitles by combining transcripts with word-level timings. https://chidiwilliams.github.io/buzz/docs/usage/edit_and_resize
In my testing this gives more precise timings and you have more options on how to combine / generate the subtitles. I tested this approach on a movie. To improve subtitle quality even more you can try to separate the voice track from the video or audio, so speech recognition happens on a cleaner audio with no background noises. See this section on more information for GUI tools that can let you separate voices from the audio https://github.com/facebookresearch/demucs?tab=readme-ov-file#graphical-interface Some future version of Buzz may include voice separation in Buzz |
@ShakeWeLy In the very latest development version an ability to extract speech before the audio is transcribed was added, this should reduce any background noises and make transcripts more accurate. Please test this and let us know if some inaccuracies still remain. |
When transcribing a 1.5 hours long video, the generated subtitles have an inaccurate timeline and do not match the sound.
Even when using Whisper Large-v3, the situation remains the same.
What adjustments do I need to make the generated subtitles more accurate?
Operating system: Windows 10
Software version: Buzz 1.1.0
The text was updated successfully, but these errors were encountered: