Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem for App Store edition #818

Open
WeCanSee opened this issue Jul 1, 2024 · 4 comments
Open

Problem for App Store edition #818

WeCanSee opened this issue Jul 1, 2024 · 4 comments
Labels
macOS Issues on macOS

Comments

@WeCanSee
Copy link

WeCanSee commented Jul 1, 2024

I am using Version 1.0.2(137) and MacOS 14.6(23G5052d).
I uploaded an mp3 file,about 44'30",45mb.I used the large model with CoreML.
When the transcribe function was over,I found some errors in the generated SRT file:

1.The timeline of subtitle file was wrong which only to the second level,not the millisecond level.So I can't use the file directly,I had to adjust the timeline mannually;
WX20240701-204927@2x

2.There are 3 paragraphs that have not been transcribed correctly,only a large amount of repetitive text is present in these paragraphs in the SRT file.This error occurred three times in this file.No matter how I rerun the transcribe program, the result is the same.7 minutes of audio not transcribed correctly in total in this file.
image

@raivisdejus
Copy link
Collaborator

If you need millisecond precision AI models Like whisper will not be able to get the precision you need. This is a limitation of how they are built and there is nothing we can do about it. Most likely none of AI tools will be able to get milisecond precision right.

When I needed millisecond precision, I used https://github.com/echogarden-project/echogarden It uses different algorithm to align the text to audio and precision is much better. I did prepare a text file with one sentence per line and used "forced alignment" feature of the echogarden, result was quite good. In general for millisecond precision you need some "forced alignment" tool. Some other options are here https://github.com/topics/forced-alignment

Regarding errors in the transcript, try large-v2 model or large-v3 those may improve precision. Try "Faster Whisper" it will use large-v2 model when you transcribe with "large" model selected.

@WeCanSee
Copy link
Author

WeCanSee commented Jul 2, 2024

If you need millisecond precision AI models Like whisper will not be able to get the precision you need. This is a limitation of how they are built and there is nothing we can do about it. Most likely none of AI tools will be able to get milisecond precision right.

When I needed millisecond precision, I used https://github.com/echogarden-project/echogarden It uses different algorithm to align the text to audio and precision is much better. I did prepare a text file with one sentence per line and used "forced alignment" feature of the echogarden, result was quite good. In general for millisecond precision you need some "forced alignment" tool. Some other options are here https://github.com/topics/forced-alignment

Regarding errors in the transcript, try large-v2 model or large-v3 those may improve precision. Try "Faster Whisper" it will use large-v2 model when you transcribe with "large" model selected.

Oh thank you for providing such a detailed answer. A little more question, I use an appstore version, so how can I use large-v3 model in Buzz Captain app? Can I just download the large-v3 model file, and use it in Buzz Captain App?

@raivisdejus
Copy link
Collaborator

On existing App store version you may be able to use Huggingface Whisper type with openai/whisper-large-v3 as model to use. This Whisper does not provide option for word level timestamps, but regular speech recognition should work.

Please note that large-v3 model can hallucinate or recognize words that are not in the speech. In this aspect openai/whisper-large-v2 may be better as it does not seem to have this problem.

Alternatively you can try the latest development version from some Action. Log into the GitHub and look at the bottom of the Action page f.e. here https://github.com/chidiwilliams/buzz/actions/runs/9656460007

@raivisdejus
Copy link
Collaborator

Please try the latest open source version as a temporary solution while the App store version issue gets sorted

https://github.com/chidiwilliams/buzz/releases

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
macOS Issues on macOS
Projects
None yet
Development

No branches or pull requests

2 participants