Turbo-V3 #1025

freddierice · 2024-10-01T02:58:01Z

I converted the new openai model weights to be used with faster-whisper. Still playing around with it, but in terms of speed its about the same as distil whisper.

https://huggingface.co/freddierice/openwhisper-turbo-large-v3-ct2/blob/main/README.md

nguyenhoanganh2002 · 2024-10-01T09:28:39Z

I converted the new openai model weights to be used with faster-whisper. Still playing around with it, but in terms of speed its about the same as distil whisper.

https://huggingface.co/freddierice/openwhisper-turbo-large-v3-ct2/blob/main/README.md

Could you convert Whisper Turbo with the multilingual tokenizer?

tjongsma · 2024-10-01T09:34:12Z

Thanks for the quick conversion! I'm getting a tokenizer error:

Traceback (most recent call last):
  File "transcribe.py", line 660, in __init__
    self.hf_tokenizer = tokenizers.Tokenizer.from_file(tokenizer_file)
Exception: data did not match any variant of untagged enum ModelWrapper at line 264861 column 3

Any support would be appreciated :)
EDIT: The link below ](https://huggingface.co/deepdml/faster-whisper-large-v3-turbo-ct2) works fine, so thank you!

asr-lord · 2024-10-01T10:14:08Z

I converted the new openai model weights to be used with faster-whisper. Still playing around with it, but in terms of speed its about the same as distil whisper.
https://huggingface.co/freddierice/openwhisper-turbo-large-v3-ct2/blob/main/README.md

Could you convert Whisper Turbo with the multilingual tokenizer?

It's done in: https://huggingface.co/deepdml/faster-whisper-large-v3-turbo-ct2

Nik-Kras · 2024-10-01T10:37:52Z

Tested https://huggingface.co/deepdml/faster-whisper-large-v3-turbo-ct2
Works very fast

stevevaius2015 · 2024-10-01T10:41:15Z

Could you show how can we test it? On Google Colab notebook I got error as no model named as "faster-whisper-large-v3-turbo-ct2" when pip install faster-whisper

asr-lord · 2024-10-01T10:46:17Z

Could you show how can we test it? On Google Colab notebook I got error as no model named as "faster-whisper-large-v3-turbo-ct2" when pip install faster-whisper

You've to download the model in your local:

from huggingface_hub import snapshot_download

repo_id = "deepdml/faster-whisper-large-v3-turbo-ct2"
local_dir = "faster-whisper-large-v3-turbo-ct2"
snapshot_download(repo_id=repo_id, local_dir=local_dir, repo_type="model")

Nik-Kras · 2024-10-01T10:54:05Z

If you guys want to test the model as a Real Time Transcription tool - I have a simple demo with Gradio for this. Just updated to code to use "deepdml/faster-whisper-large-v3-turbo-ct2"

https://github.com/Nik-Kras/Live_ASR_Whisper_Gradio

stevevaius2015 · 2024-10-01T10:55:09Z

Could you show how can we test it? On Google Colab notebook I got error as no model named as "faster-whisper-large-v3-turbo-ct2" when pip install faster-whisper

You've to download the model in your local:
from huggingface_hub import snapshot_download

repo_id = "deepdml/faster-whisper-large-v3-turbo-ct2"
local_dir = "faster-whisper-large-v3-turbo-ct2"
snapshot_download(repo_id=repo_id, local_dir=local_dir, repo_type="model")

Thanks!

milsun · 2024-10-01T11:34:31Z

any idea, how can I run it faster using apple silicon, as i have an M2 pro machine.

asr-lord · 2024-10-01T11:51:10Z

any idea, how can I run it faster using apple silicon, as i have an M2 pro machine.

Have you tried faster-whisper? It'seems that it's faster than any other framework.
https://medium.com/@GenerationAI/streaming-with-whisper-in-mlx-vs-faster-whisper-vs-insanely-fast-whisper-37cebcfc4d27

You could try: https://github.com/mustafaaljadery/lightning-whisper-mlx

hockyy · 2024-10-01T13:57:12Z

lmao the cantonese model is not word to word in the large-v3-turbo one... so sad... :( still will use the large-v3 💖
好嘅 -> 好的
是否 -> 係咪
meanings are maintained .. but come on

Nik-Kras · 2024-10-01T13:59:26Z

but come on

I don't know that language, could you give more details on your observation? What's wrong and how the result will differ with large-v3?

hockyy · 2024-10-01T14:07:57Z

but come on

I don't know that language, could you give more details on your observation? What's wrong and how the result will differ with large-v3?

--language=yue
Used this model:
https://huggingface.co/deepdml/faster-whisper-large-v3-turbo-ct2
vs the normal large-v3 model

Faster-Whisper-XXL.exe C:/Users/hocky/Downloads/Video/Video.mp4 --model {MODEL} --device CUDA --output_dir C:/Users/hocky/Downloads/Video --output_format srt --task transcribe --beam_size 10 --best_of 5 --verbose true --vad_filter true --vad_alt_method silero_v4 --standard_asia --language yue

on https://www.youtube.com/watch?v=sgRfqRFJlAg

so cantonese has two variants, one is the written cantonese, where all subtitles are mostly based on it, one is the spoken cantonese, which is literally the spoken characters written on it

藏哥係咪未讀過書?
床哥是否未讀過書?

是 => written cantonese variant (read: si)
係 => spoken cantonese variant (read: hai)

In general, if you want to learn spoken cantonese, you'll stick to the spoken version...
The difference is about 10-20%

bruh, just realized it doesn't even transcribe the most basic terms properly: the famous "DLLM"

mvoodarla · 2024-10-01T17:41:37Z

We now support the new whisper-large-v3-turbo on Sieve!

Use it via sieve/speech_transcriber: https://www.sievedata.com/functions/sieve/speech_transcriber
Use sieve/whisper directly: https://www.sievedata.com/functions/sieve/whisper

Just set speed_boost to True. API guide is under "Usage Guide" tab.

Jiltseb · 2024-10-01T19:01:00Z

@trungkienbkhn Will SYSTRAN be adding this in the HF repo to be used out of the box?

thiswillbeyourgithub · 2024-10-01T19:52:23Z

@trungkienbkhn Will SYSTRAN be adding this in the HF repo to be used out of the box?

I think that would be necessary for lots of downstream projects like faster-whisper-server

asr-lord · 2024-10-02T07:44:07Z

but come on

I don't know that language, could you give more details on your observation? What's wrong and how the result will differ with large-v3?

--language=yue Used this model: https://huggingface.co/deepdml/faster-whisper-large-v3-turbo-ct2 vs the normal large-v3 model
Faster-Whisper-XXL.exe C:/Users/hocky/Downloads/Video/Video.mp4 --model {MODEL} --device CUDA --output_dir C:/Users/hocky/Downloads/Video --output_format srt --task transcribe --beam_size 10 --best_of 5 --verbose true --vad_filter true --vad_alt_method silero_v4 --standard_asia --language yue
on https://www.youtube.com/watch?v=sgRfqRFJlAg

so cantonese has two variants, one is the written cantonese, where all subtitles are mostly based on it, one is the spoken cantonese, which is literally the spoken characters written on it

藏哥係咪未讀過書?
床哥是否未讀過書?

是 => written cantonese variant (read: si) 係 => spoken cantonese variant (read: hai)

In general, if you want to learn spoken cantonese, you'll stick to the spoken version... The difference is about 10-20%

bruh, just realized it doesn't even transcribe the most basic terms properly: the famous "DLLM"

You may find this discussion helpful:
openai/whisper#2363 (comment)
"Across languages, the turbo model performs similarly to large-v2, though it shows larger degradation on some languages like Thai and Cantonese."

tjongsma · 2024-10-04T08:45:54Z

If you guys want to test the model as a Real Time Transcription tool - I have a simple demo with Gradio for this. Just updated to code to use "deepdml/faster-whisper-large-v3-turbo-ct2"

https://github.com/Nik-Kras/Live_ASR_Whisper_Gradio

What's your experience with using v3-turbo for short audio clips? Mine has been that unfortunately it performs worse than other models, see #1030 (comment)

Nik-Kras · 2024-10-04T08:49:10Z

What's your experience with using v3-turbo for short audio clips? Mine has been that unfortunately it performs worse than other models, see #1030 (comment)

I only planned to evaluate turbo, turbo-CTranslate2 and turbo-HQQ. I can only tell that it works x2-x3 times faster based on my logs with the Gradio demo.

How bad are the evaluation results?

asr-lord · 2024-10-04T09:27:15Z

What's your experience with using v3-turbo for short audio clips? Mine has been that unfortunately it performs worse than other models, see #1030 (comment)

I only planned to evaluate turbo, turbo-CTranslate2 and turbo-HQQ. I can only tell that it works x2-x3 times faster based on my logs with the Gradio demo.

How do you use HQQ in faster-whisper? Could you share a sample code?
I only see how to use it with Transformers library: https://github.com/mobiusml/hqq?tab=readme-ov-file#transformers-

Nik-Kras · 2024-10-04T09:41:33Z

What's your experience with using v3-turbo for short audio clips? Mine has been that unfortunately it performs worse than other models, see #1030 (comment)

I only planned to evaluate turbo, turbo-CTranslate2 and turbo-HQQ. I can only tell that it works x2-x3 times faster based on my logs with the Gradio demo.

How do you use HQQ in faster-whisper? Could you share a sample code? I only see how to use it with Transformers library: https://github.com/mobiusml/hqq?tab=readme-ov-file#transformers-

Right, HQQ works with Transformers. But faster-whisper is just whisper accelerated with CTranslate2 and there are models of turbo accelerated with CT2 available on HuggingFace: deepdml/faster-whisper-large-v3-turbo-ct2

Also, HQQ is integrated in Transformers, so quantization should be as easy as passing an argument

model_id = "deepdml/faster-whisper-large-v3-turbo-ct2"
quant_config = HqqConfig(nbits=4, group_size=64)

model = AutoModelForSpeechSeq2Seq.from_pretrained(
    model_id,
    torch_dtype=torch_dtype,
    low_cpu_mem_usage=True,
    use_safetensors=True,
    quantization_config=quant_config
)

https://huggingface.co/docs/transformers/main/quantization/hqq

I didn't try it yet, so don't know if that is going to work.
How about to have a chat about it outside of the GitHub issue? Send me a message on LinkedIn, I have it attached in the profile

Jiltseb · 2024-10-04T09:57:15Z

No, There is no HQQ support for ctranslate2 yet.

Faster-whisper has whisper models in Ctranslate2 format, which is different from pytorch models in HF. Of course the models are available in HF so that it is easy to use in packages such as faster-whisper. But one can not directly load a ctranslate2 checkpoint with AutoModelForSpeechSeq2Seq.

I have created a feature request in the past to support HQQ (with static cache and torch compilation): OpenNMT/CTranslate2#1717

The PR is still in progress and it has some performance issues that needs to be fixed.

tjongsma · 2024-10-04T12:55:02Z

What's your experience with using v3-turbo for short audio clips? Mine has been that unfortunately it performs worse than other models, see #1030 (comment)

I only planned to evaluate turbo, turbo-CTranslate2 and turbo-HQQ. I can only tell that it works x2-x3 times faster based on my logs with the Gradio demo.

How bad are the evaluation results?

No standardized evaluation or anything, I'm just running it in my streaming application and seeing way worse results than medium (especially with it randomly just not transcribing part of the text). This is a ctranslate2 implementation, deepdml/faster-whisper-large-v3-turbo-ct2 to be exact. See my linked comment for the code

silvacarl2 · 2024-10-07T17:35:00Z

were you able to convert turbo to faster-whisper format?

Jiltseb · 2024-10-09T15:06:12Z

Mobiuslabs fork now supports turbo out of the box, and has additional fixes.

jordimas · 2024-10-10T13:37:09Z

Just to mention that I added support in https://github.com/Softcatala/whisper-ctranslate2 for anybody that wants to test the turbo-model with the current with the current faster-whisper version.

freddierice · 2024-10-10T14:06:01Z

There seems to be a lot of confusion in this thread -- if you want to use turbo with the current faster whisper, all you have to do is

WhisperModel("deepdml/faster-whisper-large-v3-turbo-ct2", device="cuda", compute_type="float16")

Closing this thread since there is no issue.

dudztroyer mentioned this issue Oct 1, 2024

Turbo-V3 runpod-workers/worker-faster_whisper#35

Open

brainer3220 mentioned this issue Oct 1, 2024

Turbo-V3 m-bain/whisperX#894

Open

slavonnet mentioned this issue Oct 7, 2024

Support TURBO model Softcatala/whisper-ctranslate2#105

Closed

freddierice closed this as completed Oct 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Turbo-V3 #1025

Turbo-V3 #1025

freddierice commented Oct 1, 2024

nguyenhoanganh2002 commented Oct 1, 2024

tjongsma commented Oct 1, 2024 •

edited

Loading

asr-lord commented Oct 1, 2024

Nik-Kras commented Oct 1, 2024

stevevaius2015 commented Oct 1, 2024

asr-lord commented Oct 1, 2024 •

edited

Loading

Nik-Kras commented Oct 1, 2024

stevevaius2015 commented Oct 1, 2024

milsun commented Oct 1, 2024

asr-lord commented Oct 1, 2024

hockyy commented Oct 1, 2024 •

edited

Loading

Nik-Kras commented Oct 1, 2024 •

edited

Loading

hockyy commented Oct 1, 2024 •

edited

Loading

mvoodarla commented Oct 1, 2024

Jiltseb commented Oct 1, 2024

thiswillbeyourgithub commented Oct 1, 2024

asr-lord commented Oct 2, 2024

tjongsma commented Oct 4, 2024

Nik-Kras commented Oct 4, 2024

asr-lord commented Oct 4, 2024

Nik-Kras commented Oct 4, 2024

Jiltseb commented Oct 4, 2024

tjongsma commented Oct 4, 2024

silvacarl2 commented Oct 7, 2024

Jiltseb commented Oct 9, 2024

jordimas commented Oct 10, 2024

freddierice commented Oct 10, 2024

Turbo-V3 #1025

Turbo-V3 #1025

Comments

freddierice commented Oct 1, 2024

nguyenhoanganh2002 commented Oct 1, 2024

tjongsma commented Oct 1, 2024 • edited Loading

asr-lord commented Oct 1, 2024

Nik-Kras commented Oct 1, 2024

stevevaius2015 commented Oct 1, 2024

asr-lord commented Oct 1, 2024 • edited Loading

Nik-Kras commented Oct 1, 2024

stevevaius2015 commented Oct 1, 2024

milsun commented Oct 1, 2024

asr-lord commented Oct 1, 2024

hockyy commented Oct 1, 2024 • edited Loading

Nik-Kras commented Oct 1, 2024 • edited Loading

hockyy commented Oct 1, 2024 • edited Loading

mvoodarla commented Oct 1, 2024

Jiltseb commented Oct 1, 2024

thiswillbeyourgithub commented Oct 1, 2024

asr-lord commented Oct 2, 2024

tjongsma commented Oct 4, 2024

Nik-Kras commented Oct 4, 2024

asr-lord commented Oct 4, 2024

Nik-Kras commented Oct 4, 2024

Jiltseb commented Oct 4, 2024

tjongsma commented Oct 4, 2024

silvacarl2 commented Oct 7, 2024

Jiltseb commented Oct 9, 2024

jordimas commented Oct 10, 2024

freddierice commented Oct 10, 2024

tjongsma commented Oct 1, 2024 •

edited

Loading

asr-lord commented Oct 1, 2024 •

edited

Loading

hockyy commented Oct 1, 2024 •

edited

Loading

Nik-Kras commented Oct 1, 2024 •

edited

Loading

hockyy commented Oct 1, 2024 •

edited

Loading