Problem for App Store edition #818

WeCanSee · 2024-07-01T12:56:10Z

I am using Version 1.0.2(137) and MacOS 14.6(23G5052d).
I uploaded an mp3 file,about 44'30",45mb.I used the large model with CoreML.
When the transcribe function was over,I found some errors in the generated SRT file:

1.The timeline of subtitle file was wrong which only to the second level,not the millisecond level.So I can't use the file directly,I had to adjust the timeline mannually;

2.There are 3 paragraphs that have not been transcribed correctly,only a large amount of repetitive text is present in these paragraphs in the SRT file.This error occurred three times in this file.No matter how I rerun the transcribe program, the result is the same.7 minutes of audio not transcribed correctly in total in this file.

raivisdejus · 2024-07-02T05:31:18Z

If you need millisecond precision AI models Like whisper will not be able to get the precision you need. This is a limitation of how they are built and there is nothing we can do about it. Most likely none of AI tools will be able to get milisecond precision right.

When I needed millisecond precision, I used https://github.com/echogarden-project/echogarden It uses different algorithm to align the text to audio and precision is much better. I did prepare a text file with one sentence per line and used "forced alignment" feature of the echogarden, result was quite good. In general for millisecond precision you need some "forced alignment" tool. Some other options are here https://github.com/topics/forced-alignment

Regarding errors in the transcript, try large-v2 model or large-v3 those may improve precision. Try "Faster Whisper" it will use large-v2 model when you transcribe with "large" model selected.

WeCanSee · 2024-07-02T13:36:38Z

If you need millisecond precision AI models Like whisper will not be able to get the precision you need. This is a limitation of how they are built and there is nothing we can do about it. Most likely none of AI tools will be able to get milisecond precision right.

When I needed millisecond precision, I used https://github.com/echogarden-project/echogarden It uses different algorithm to align the text to audio and precision is much better. I did prepare a text file with one sentence per line and used "forced alignment" feature of the echogarden, result was quite good. In general for millisecond precision you need some "forced alignment" tool. Some other options are here https://github.com/topics/forced-alignment

Regarding errors in the transcript, try large-v2 model or large-v3 those may improve precision. Try "Faster Whisper" it will use large-v2 model when you transcribe with "large" model selected.

Oh thank you for providing such a detailed answer. A little more question, I use an appstore version, so how can I use large-v3 model in Buzz Captain app? Can I just download the large-v3 model file, and use it in Buzz Captain App?

raivisdejus · 2024-07-02T15:03:21Z

On existing App store version you may be able to use Huggingface Whisper type with openai/whisper-large-v3 as model to use. This Whisper does not provide option for word level timestamps, but regular speech recognition should work.

Please note that large-v3 model can hallucinate or recognize words that are not in the speech. In this aspect openai/whisper-large-v2 may be better as it does not seem to have this problem.

Alternatively you can try the latest development version from some Action. Log into the GitHub and look at the bottom of the Action page f.e. here https://github.com/chidiwilliams/buzz/actions/runs/9656460007

raivisdejus · 2024-11-24T14:40:54Z

Please try the latest open source version as a temporary solution while the App store version issue gets sorted

https://github.com/chidiwilliams/buzz/releases

Rem-ux mentioned this issue Jul 20, 2024

Incomplete transcription-Mac App Store version #851

Open

raivisdejus added the macOS Issues on macOS label Jul 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem for App Store edition #818

Problem for App Store edition #818

WeCanSee commented Jul 1, 2024

raivisdejus commented Jul 2, 2024

WeCanSee commented Jul 2, 2024

raivisdejus commented Jul 2, 2024

raivisdejus commented Nov 24, 2024

Problem for App Store edition #818

Problem for App Store edition #818

Comments

WeCanSee commented Jul 1, 2024

raivisdejus commented Jul 2, 2024

WeCanSee commented Jul 2, 2024

raivisdejus commented Jul 2, 2024

raivisdejus commented Nov 24, 2024