From 8259fa442402de7cf216bbd16de2de8af8f1b30c Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Drago=C8=99?= Date: Mon, 22 Jul 2024 19:12:31 +0200 Subject: [PATCH] Add RTF results for cts_nl --- NISV/cts_nl/res_labelled.md | 14 +++++++------- NISV/cts_nl/res_unlabelled.md | 14 +++++++------- 2 files changed, 14 insertions(+), 14 deletions(-) diff --git a/NISV/cts_nl/res_labelled.md b/NISV/cts_nl/res_labelled.md index dc888cd..03d109d 100644 --- a/NISV/cts_nl/res_labelled.md +++ b/NISV/cts_nl/res_labelled.md @@ -42,17 +42,17 @@ And a matrix with the **time** spent in total by each implementation **to load a \* For WhisperX, a separate alignment model based on wav2vec 2.0 has been applied in order to obtain word-level timestamps. Therefore, the time measured contains the time to load the model, time to transcribe, and time to align to generate timestamps. Speaker diarization has also been applied for WhisperX, which is measured separately and covered in [this section](./whisperx.md). - +|[OpenAI](https://github.com/openai/whisper)|0.2695|0.2189|0.2396|0.246| +|[Huggingface (`transformers`)](https://huggingface.co/openai/whisper-large-v2#long-form-transcription)|0.1606|0.1083|0.1311|0.1604| +|[faster-whisper](https://github.com/SYSTRAN/faster-whisper/)|0.0791|0.1497|0.0963|0.1483| +|**[faster-whisper w/ batching](https://github.com/SYSTRAN/faster-whisper/pull/856)**|**0.0522**|**0.0898**|**0.0487**|**0.0882**| +|[WhisperX](https://github.com/m-bain/whisperX/)\*|0.238|0.2551|0.2176|0.2575|
diff --git a/NISV/cts_nl/res_unlabelled.md b/NISV/cts_nl/res_unlabelled.md index c785de4..b826610 100644 --- a/NISV/cts_nl/res_unlabelled.md +++ b/NISV/cts_nl/res_unlabelled.md @@ -31,18 +31,18 @@ Here's a matrix with the **time** spent in total by each implementation **to loa |[WhisperX](https://github.com/m-bain/whisperX/)*|21m:29s|22m:14s|20m:28s|21m:36s| \* For WhisperX, a separate alignment model based on wav2vec 2.0 has been applied in order to obtain word-level timestamps. Therefore, the time measured contains the time to load the model, time to transcribe, and time to align to generate timestamps. Speaker diarization has also been applied for WhisperX, which is measured separately and covered in a different section. - +|[OpenAI](https://github.com/openai/whisper)|0.2115|0.1675|0.2604|0.2049| +|[Huggingface (`transformers`)](https://huggingface.co/openai/whisper-large-v2#long-form-transcription)|0.1139|0.1848|0.083|0.1835| +|[faster-whisper](https://github.com/SYSTRAN/faster-whisper/)|0.0817|0.1647|0.0932|0.1559| +|**[faster-whisper w/ batching](https://github.com/SYSTRAN/faster-whisper/pull/856)**|**0.0227**|**0.0433**|**0.0195**|**0.0402**| +|[WhisperX](https://github.com/m-bain/whisperX/)\*|0.0629|0.0651|0.0599|0.0633|