-
-
Notifications
You must be signed in to change notification settings - Fork 12
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
c99e1b1
commit cac26b6
Showing
1 changed file
with
52 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,52 @@ | ||
--- | ||
title: "Our model recommendation" | ||
date: "2024-03-03" | ||
author: "Kurain Benoy" | ||
# authorImg: "jjthomas.png" | ||
draft: true | ||
description: "Our list of best practises for each languages, this will help you getting a better idea for model recommendation." | ||
tags: [best-practises, model-selection, ASR] | ||
--- | ||
|
||
## About Indic Subtitler | ||
|
||
An open source subtitling platform 💻 for transcribing videos/audios in Indic languages and translating subtitles as well using ML models. | ||
It is powered by models like SeamlessM4T, faster-whisper, WhisperX and Vegam-Whisper which support almost 12 Indic languages by default. | ||
|
||
Since Indic Subtitler comes with this much models, this article is an effort to recommend our platform users to share, which | ||
is the best models to use for your use case, in your own languages. | ||
|
||
## About the models available in Indic Subtitler | ||
|
||
1. SeamlessM4T(default) | ||
|
||
Seamless is a family of AI models that enable more natural and authentic communication across languages. SeamlessM4T is a massive multilingual multimodal machine translation model supporting around 100 languages. SeamlessM4T serves as the foundation for SeamlessExpressive, a model that preserves elements of prosody and voice style across languages and SeamlessStreaming, a model supporting simultaneous translation and streaming ASR for around 100 languages. SeamlessExpressive and SeamlessStreaming are combined into Seamless, a unified model featuring multilinguality, real-time, and expressive translations. | ||
|
||
2. Faster-whisper | ||
|
||
faster-whisper is a reimplementation of OpenAI's Whisper model using CTranslate2, which is a fast inference engine for Transformer models. This implementation is up to 4 times faster than openai/whisper for the same accuracy while using less memory. The efficiency can be further improved with 8-bit quantization on both CPU and GPU. | ||
|
||
3. WhisperX | ||
|
||
WhisperX provides fast automatic speech recognition (70x realtime with large-v2) with word-level timestamps and speaker diarization. The features provided by WhisperX are: Batched inference for 70x realtime transcription using whisper large-v2, faster-whisper backend, requires <8GB GPU memory for large-v2 with beam_size=5, Accurate word-level timestamps using wav2vec2 alignment, Multispeaker ASR using speaker diarization from pyannote-audio (speaker ID labels) and VAD preprocessing, reduces hallucination & batching with no WER degradation. | ||
|
||
4. Vegam-whisper | ||
|
||
A faster-whisper fine-tuned version of Malayalam whisper, which performs decently in Malayalam. This is a conversion of `thennal/whisper-medium-ml` to the CTranslate2 model format. | ||
|
||
## Our Model recommendations | ||
|
||
### For English audio | ||
|
||
WhisperX > faster-whisper > SeamlessM4T | ||
|
||
If you want better timestamp accuracy use SeamlessM4T | ||
|
||
### For Malayalam Audio | ||
|
||
Vegam-whisper = faster-whisper > SeamlessM4T | ||
|
||
### For hindi audio | ||
|
||
Faster-Whisper > WhisperX > SeamlessM4t | ||
|