Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Turbo-V3 #1025

Closed
freddierice opened this issue Oct 1, 2024 · 27 comments
Closed

Turbo-V3 #1025

freddierice opened this issue Oct 1, 2024 · 27 comments

Comments

@freddierice
Copy link

I converted the new openai model weights to be used with faster-whisper. Still playing around with it, but in terms of speed its about the same as distil whisper.

https://huggingface.co/freddierice/openwhisper-turbo-large-v3-ct2/blob/main/README.md

@nguyenhoanganh2002
Copy link

I converted the new openai model weights to be used with faster-whisper. Still playing around with it, but in terms of speed its about the same as distil whisper.

https://huggingface.co/freddierice/openwhisper-turbo-large-v3-ct2/blob/main/README.md

Could you convert Whisper Turbo with the multilingual tokenizer?

@tjongsma
Copy link

tjongsma commented Oct 1, 2024

Thanks for the quick conversion! I'm getting a tokenizer error:

Traceback (most recent call last):
  File "transcribe.py", line 660, in __init__
    self.hf_tokenizer = tokenizers.Tokenizer.from_file(tokenizer_file)
Exception: data did not match any variant of untagged enum ModelWrapper at line 264861 column 3

Any support would be appreciated :)
EDIT: The link below ](https://huggingface.co/deepdml/faster-whisper-large-v3-turbo-ct2) works fine, so thank you!

@asr-lord
Copy link

asr-lord commented Oct 1, 2024

I converted the new openai model weights to be used with faster-whisper. Still playing around with it, but in terms of speed its about the same as distil whisper.
https://huggingface.co/freddierice/openwhisper-turbo-large-v3-ct2/blob/main/README.md

Could you convert Whisper Turbo with the multilingual tokenizer?

It's done in: https://huggingface.co/deepdml/faster-whisper-large-v3-turbo-ct2

@Nik-Kras
Copy link

Nik-Kras commented Oct 1, 2024

Tested https://huggingface.co/deepdml/faster-whisper-large-v3-turbo-ct2
Works very fast

@stevevaius2015
Copy link

Could you show how can we test it? On Google Colab notebook I got error as no model named as "faster-whisper-large-v3-turbo-ct2" when pip install faster-whisper

@asr-lord
Copy link

asr-lord commented Oct 1, 2024

Could you show how can we test it? On Google Colab notebook I got error as no model named as "faster-whisper-large-v3-turbo-ct2" when pip install faster-whisper

You've to download the model in your local:

from huggingface_hub import snapshot_download

repo_id = "deepdml/faster-whisper-large-v3-turbo-ct2"
local_dir = "faster-whisper-large-v3-turbo-ct2"
snapshot_download(repo_id=repo_id, local_dir=local_dir, repo_type="model")

@Nik-Kras
Copy link

Nik-Kras commented Oct 1, 2024

If you guys want to test the model as a Real Time Transcription tool - I have a simple demo with Gradio for this. Just updated to code to use "deepdml/faster-whisper-large-v3-turbo-ct2"

https://github.com/Nik-Kras/Live_ASR_Whisper_Gradio

@stevevaius2015
Copy link

Could you show how can we test it? On Google Colab notebook I got error as no model named as "faster-whisper-large-v3-turbo-ct2" when pip install faster-whisper

You've to download the model in your local:

from huggingface_hub import snapshot_download

repo_id = "deepdml/faster-whisper-large-v3-turbo-ct2"
local_dir = "faster-whisper-large-v3-turbo-ct2"
snapshot_download(repo_id=repo_id, local_dir=local_dir, repo_type="model")

Thanks!

@milsun
Copy link

milsun commented Oct 1, 2024

any idea, how can I run it faster using apple silicon, as i have an M2 pro machine.

@asr-lord
Copy link

asr-lord commented Oct 1, 2024

any idea, how can I run it faster using apple silicon, as i have an M2 pro machine.

Have you tried faster-whisper? It'seems that it's faster than any other framework.
https://medium.com/@GenerationAI/streaming-with-whisper-in-mlx-vs-faster-whisper-vs-insanely-fast-whisper-37cebcfc4d27

You could try: https://github.com/mustafaaljadery/lightning-whisper-mlx

@hockyy
Copy link

hockyy commented Oct 1, 2024

lmao the cantonese model is not word to word in the large-v3-turbo one... so sad... :( still will use the large-v3 💖
好嘅 -> 好的
是否 -> 係咪
meanings are maintained .. but come on

@Nik-Kras
Copy link

Nik-Kras commented Oct 1, 2024

but come on

I don't know that language, could you give more details on your observation? What's wrong and how the result will differ with large-v3?

@hockyy
Copy link

hockyy commented Oct 1, 2024

but come on

I don't know that language, could you give more details on your observation? What's wrong and how the result will differ with large-v3?

--language=yue
Used this model:
https://huggingface.co/deepdml/faster-whisper-large-v3-turbo-ct2
vs the normal large-v3 model

Faster-Whisper-XXL.exe C:/Users/hocky/Downloads/Video/Video.mp4 --model {MODEL} --device CUDA --output_dir C:/Users/hocky/Downloads/Video --output_format srt --task transcribe --beam_size 10 --best_of 5 --verbose true --vad_filter true --vad_alt_method silero_v4 --standard_asia --language yue

on https://www.youtube.com/watch?v=sgRfqRFJlAg

so cantonese has two variants, one is the written cantonese, where all subtitles are mostly based on it, one is the spoken cantonese, which is literally the spoken characters written on it
image
image

藏哥係咪未讀過書?
床哥是否未讀過書?

是 => written cantonese variant (read: si)
係 => spoken cantonese variant (read: hai)

In general, if you want to learn spoken cantonese, you'll stick to the spoken version...
The difference is about 10-20%

bruh, just realized it doesn't even transcribe the most basic terms properly: the famous "DLLM"

@mvoodarla
Copy link

We now support the new whisper-large-v3-turbo on Sieve!

Use it via sieve/speech_transcriber: https://www.sievedata.com/functions/sieve/speech_transcriber
Use sieve/whisper directly: https://www.sievedata.com/functions/sieve/whisper

Just set speed_boost to True. API guide is under "Usage Guide" tab.

@Jiltseb
Copy link
Contributor

Jiltseb commented Oct 1, 2024

@trungkienbkhn Will SYSTRAN be adding this in the HF repo to be used out of the box?

@thiswillbeyourgithub
Copy link

@trungkienbkhn Will SYSTRAN be adding this in the HF repo to be used out of the box?

I think that would be necessary for lots of downstream projects like faster-whisper-server

@asr-lord
Copy link

asr-lord commented Oct 2, 2024

but come on

I don't know that language, could you give more details on your observation? What's wrong and how the result will differ with large-v3?

--language=yue Used this model: https://huggingface.co/deepdml/faster-whisper-large-v3-turbo-ct2 vs the normal large-v3 model

Faster-Whisper-XXL.exe C:/Users/hocky/Downloads/Video/Video.mp4 --model {MODEL} --device CUDA --output_dir C:/Users/hocky/Downloads/Video --output_format srt --task transcribe --beam_size 10 --best_of 5 --verbose true --vad_filter true --vad_alt_method silero_v4 --standard_asia --language yue

on https://www.youtube.com/watch?v=sgRfqRFJlAg

so cantonese has two variants, one is the written cantonese, where all subtitles are mostly based on it, one is the spoken cantonese, which is literally the spoken characters written on it image image

藏哥係咪未讀過書?
床哥是否未讀過書?

是 => written cantonese variant (read: si) 係 => spoken cantonese variant (read: hai)

In general, if you want to learn spoken cantonese, you'll stick to the spoken version... The difference is about 10-20%

bruh, just realized it doesn't even transcribe the most basic terms properly: the famous "DLLM"

You may find this discussion helpful:
openai/whisper#2363 (comment)
"Across languages, the turbo model performs similarly to large-v2, though it shows larger degradation on some languages like Thai and Cantonese."
image

@tjongsma
Copy link

tjongsma commented Oct 4, 2024

If you guys want to test the model as a Real Time Transcription tool - I have a simple demo with Gradio for this. Just updated to code to use "deepdml/faster-whisper-large-v3-turbo-ct2"

https://github.com/Nik-Kras/Live_ASR_Whisper_Gradio

What's your experience with using v3-turbo for short audio clips? Mine has been that unfortunately it performs worse than other models, see #1030 (comment)

@Nik-Kras
Copy link

Nik-Kras commented Oct 4, 2024

What's your experience with using v3-turbo for short audio clips? Mine has been that unfortunately it performs worse than other models, see #1030 (comment)

I only planned to evaluate turbo, turbo-CTranslate2 and turbo-HQQ. I can only tell that it works x2-x3 times faster based on my logs with the Gradio demo.

How bad are the evaluation results?

@asr-lord
Copy link

asr-lord commented Oct 4, 2024

What's your experience with using v3-turbo for short audio clips? Mine has been that unfortunately it performs worse than other models, see #1030 (comment)

I only planned to evaluate turbo, turbo-CTranslate2 and turbo-HQQ. I can only tell that it works x2-x3 times faster based on my logs with the Gradio demo.

How do you use HQQ in faster-whisper? Could you share a sample code?
I only see how to use it with Transformers library: https://github.com/mobiusml/hqq?tab=readme-ov-file#transformers-

@Nik-Kras
Copy link

Nik-Kras commented Oct 4, 2024

What's your experience with using v3-turbo for short audio clips? Mine has been that unfortunately it performs worse than other models, see #1030 (comment)

I only planned to evaluate turbo, turbo-CTranslate2 and turbo-HQQ. I can only tell that it works x2-x3 times faster based on my logs with the Gradio demo.

How do you use HQQ in faster-whisper? Could you share a sample code? I only see how to use it with Transformers library: https://github.com/mobiusml/hqq?tab=readme-ov-file#transformers-

Right, HQQ works with Transformers. But faster-whisper is just whisper accelerated with CTranslate2 and there are models of turbo accelerated with CT2 available on HuggingFace: deepdml/faster-whisper-large-v3-turbo-ct2

Also, HQQ is integrated in Transformers, so quantization should be as easy as passing an argument

model_id = "deepdml/faster-whisper-large-v3-turbo-ct2"
quant_config = HqqConfig(nbits=4, group_size=64)

model = AutoModelForSpeechSeq2Seq.from_pretrained(
    model_id,
    torch_dtype=torch_dtype,
    low_cpu_mem_usage=True,
    use_safetensors=True,
    quantization_config=quant_config
)

https://huggingface.co/docs/transformers/main/quantization/hqq

I didn't try it yet, so don't know if that is going to work.
How about to have a chat about it outside of the GitHub issue? Send me a message on LinkedIn, I have it attached in the profile

@Jiltseb
Copy link
Contributor

Jiltseb commented Oct 4, 2024

No, There is no HQQ support for ctranslate2 yet.

Faster-whisper has whisper models in Ctranslate2 format, which is different from pytorch models in HF. Of course the models are available in HF so that it is easy to use in packages such as faster-whisper. But one can not directly load a ctranslate2 checkpoint with AutoModelForSpeechSeq2Seq.

I have created a feature request in the past to support HQQ (with static cache and torch compilation): OpenNMT/CTranslate2#1717

The PR is still in progress and it has some performance issues that needs to be fixed.

@tjongsma
Copy link

tjongsma commented Oct 4, 2024

What's your experience with using v3-turbo for short audio clips? Mine has been that unfortunately it performs worse than other models, see #1030 (comment)

I only planned to evaluate turbo, turbo-CTranslate2 and turbo-HQQ. I can only tell that it works x2-x3 times faster based on my logs with the Gradio demo.

How bad are the evaluation results?

No standardized evaluation or anything, I'm just running it in my streaming application and seeing way worse results than medium (especially with it randomly just not transcribing part of the text). This is a ctranslate2 implementation, deepdml/faster-whisper-large-v3-turbo-ct2 to be exact. See my linked comment for the code

@silvacarl2
Copy link

were you able to convert turbo to faster-whisper format?

@Jiltseb
Copy link
Contributor

Jiltseb commented Oct 9, 2024

Mobiuslabs fork now supports turbo out of the box, and has additional fixes.

@jordimas
Copy link
Contributor

Just to mention that I added support in https://github.com/Softcatala/whisper-ctranslate2 for anybody that wants to test the turbo-model with the current with the current faster-whisper version.

@freddierice
Copy link
Author

There seems to be a lot of confusion in this thread -- if you want to use turbo with the current faster whisper, all you have to do is

WhisperModel("deepdml/faster-whisper-large-v3-turbo-ct2", device="cuda", compute_type="float16")

Closing this thread since there is no issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests