Skip to content

Latest commit

 

History

History
478 lines (303 loc) · 31.3 KB

PERFORMANCE.md

File metadata and controls

478 lines (303 loc) · 31.3 KB

SpeechBrain Performance Report

This document provides an overview of the performance achieved on key datasets and tasks supported by SpeechBrain.

AISHELL-1 Dataset

ASR

Model Checkpoints HuggingFace Test-CER
recipes/AISHELL-1/ASR/CTC/hparams/train_with_wav2vec.yaml here here 5.06
recipes/AISHELL-1/ASR/seq2seq/hparams/train.yaml here - 7.51
recipes/AISHELL-1/ASR/transformer/hparams/train_ASR_transformer.yaml here here 6.04
recipes/AISHELL-1/ASR/transformer/hparams/train_ASR_transformer_with_wav2vect.yaml here here 5.58

Aishell1Mix Dataset

Separation

Model Checkpoints HuggingFace SI-SNRi
recipes/Aishell1Mix/separation/hparams/sepformer-aishell1mix2.yaml here - 13.4dB
recipes/Aishell1Mix/separation/hparams/sepformer-aishell1mix3.yaml here - 11.2dB

BinauralWSJ0Mix Dataset

Separation

Model Checkpoints HuggingFace SI-SNRi
recipes/BinauralWSJ0Mix/separation/hparams/convtasnet-cross.yaml here - 12.39dB
recipes/BinauralWSJ0Mix/separation/hparams/convtasnet-independent.yaml here - 11.90dB
recipes/BinauralWSJ0Mix/separation/hparams/convtasnet-parallel-noise.yaml here - 18.25dB
recipes/BinauralWSJ0Mix/separation/hparams/convtasnet-parallel-reverb.yaml here - 6.95dB
recipes/BinauralWSJ0Mix/separation/hparams/convtasnet-parallel.yaml here - 16.93dB

CVSS Dataset

S2ST

Model Checkpoints HuggingFace Test-sacrebleu
recipes/CVSS/S2ST/hparams/train_fr-en.yaml here here 24.47

CommonLanguage Dataset

Language-id

Model Checkpoints HuggingFace Error
recipes/CommonLanguage/lang_id/hparams/train_ecapa_tdnn.yaml here here 15.1%

CommonVoice Dataset

ASR-transducer

Model Checkpoints HuggingFace Test-WER
recipes/CommonVoice/ASR/transducer/hparams/train_fr.yaml here here 17.58%
recipes/CommonVoice/ASR/transducer/hparams/train_it.yaml here here 14.88%
recipes/CommonVoice/ASR/transducer/hparams/train_de.yaml here here 15.25%

ASR-transformer

Model Checkpoints HuggingFace Test-WER
recipes/CommonVoice/ASR/transformer/hparams/train_fr.yaml here - 17.61%
recipes/CommonVoice/ASR/transformer/hparams/train_it.yaml here - 16.80%
recipes/CommonVoice/ASR/transformer/hparams/train_de.yaml here - 16.76%
recipes/CommonVoice/ASR/transformer/hparams/train_ar_hf_whisper.yaml here here 16.96%
recipes/CommonVoice/ASR/transformer/hparams/train_fa_hf_whisper.yaml here here 31.75%
recipes/CommonVoice/ASR/transformer/hparams/train_fr_hf_whisper.yaml here here 10.62%
recipes/CommonVoice/ASR/transformer/hparams/train_sr_hf_whisper.yaml here here 22.29%
recipes/CommonVoice/ASR/transformer/hparams/train_mn_hf_whisper.yaml here here 67.84%
recipes/CommonVoice/ASR/transformer/hparams/train_hi_hf_whisper.yaml here here 15.27%
recipes/CommonVoice/ASR/transformer/hparams/train_it_hf_whisper.yaml here here 9.63%

ASR-CTC

Model Checkpoints HuggingFace Test-WER
recipes/CommonVoice/ASR/CTC/hparams/train_en_with_wav2vec.yaml here here 16.16%
recipes/CommonVoice/ASR/CTC/hparams/train_fr_with_wav2vec.yaml here here 9.71%
recipes/CommonVoice/ASR/CTC/hparams/train_it_with_wav2vec.yaml here here 7.99%
recipes/CommonVoice/ASR/CTC/hparams/train_rw_with_wav2vec.yaml here here 22.52%
recipes/CommonVoice/ASR/CTC/hparams/train_de_with_wav2vec.yaml here here 8.39%
recipes/CommonVoice/ASR/CTC/hparams/train_ar_with_wav2vec.yaml here here 28.53%
recipes/CommonVoice/ASR/CTC/hparams/train_es_with_wav2vec.yaml here here 12.67%
recipes/CommonVoice/ASR/CTC/hparams/train_pt_with_wav2vec.yaml here here 21.69%
recipes/CommonVoice/ASR/CTC/hparams/train_zh-CN_with_wav2vec.yaml here here 23.17%

ASR-seq2seq

Model Checkpoints HuggingFace Test-WER
recipes/CommonVoice/ASR/seq2seq/hparams/train_de.yaml here here 12.25%
recipes/CommonVoice/ASR/seq2seq/hparams/train_en.yaml here here 23.88%
recipes/CommonVoice/ASR/seq2seq/hparams/train_fr.yaml here here 14.88%
recipes/CommonVoice/ASR/seq2seq/hparams/train_it.yaml here here 17.02%
recipes/CommonVoice/ASR/seq2seq/hparams/train_rw.yaml here here 29.22%
recipes/CommonVoice/ASR/seq2seq/hparams/train_es.yaml here here 14.77%

DNS Dataset

Enhancement

Model Checkpoints HuggingFace valid-PESQ test-SIG test-BAK test-OVRL
recipes/DNS/enhancement/hparams/sepformer-dns-16k.yaml here here 2.06 2.999 3.076 2.437

DVoice Dataset

ASR-CTC

Model Checkpoints HuggingFace Test-WER
recipes/DVoice/ASR/CTC/hparams/train_amh_with_wav2vec.yaml here here 24.92%
recipes/DVoice/ASR/CTC/hparams/train_dar_with_wav2vec.yaml here here 18.28%
recipes/DVoice/ASR/CTC/hparams/train_fon_with_wav2vec.yaml here here 9.00%
recipes/DVoice/ASR/CTC/hparams/train_sw_with_wav2vec.yaml here here 23.16%
recipes/DVoice/ASR/CTC/hparams/train_wol_with_wav2vec.yaml here here 16.05%

Multilingual-ASR-CTC

Model Checkpoints HuggingFace WER-Darija WER-Swahili WER-Fongbe Fongbe-Wolof WER-Amharic
recipes/DVoice/ASR/CTC/hparams/train_multi_with_wav2vec.yaml here - 13.27% 29.31% 10.26% 21.54% 31.15%

ESC50 Dataset

SoundClassification

Model Checkpoints HuggingFace Accuracy
recipes/ESC50/classification/hparams/cnn14_classifier.yaml here - 82%
recipes/ESC50/classification/hparams/conv2d_classifier.yaml here - 75%

Fisher-Callhome-Spanish Dataset

Speech_Translation

Model Checkpoints HuggingFace Test-sacrebleu
recipes/Fisher-Callhome-Spanish/ST/transformer/hparams/transformer.yaml here - 47.31
recipes/Fisher-Callhome-Spanish/ST/transformer/hparams/conformer.yaml here - 48.04

Google-speech-commands Dataset

Command_recognition

Model Checkpoints HuggingFace Test-accuracy
recipes/Google-speech-commands/hparams/xvect.yaml here here 97.43%
recipes/Google-speech-commands/hparams/xvect_leaf.yaml here - 96.79%

IEMOCAP Dataset

Emotion_recognition

Model Checkpoints HuggingFace Test-Accuracy
recipes/IEMOCAP/emotion_recognition/hparams/train_with_wav2vec2.yaml here here 65.7%
recipes/IEMOCAP/emotion_recognition/hparams/train.yaml here - 77.0%

IWSLT22_lowresource Dataset

Speech_Translation

Model Checkpoints HuggingFace Test-BLEU
recipes/IWSLT22_lowresource/AST/transformer/hparams/train_w2v2_mbart_st.yaml here - 7.73
recipes/IWSLT22_lowresource/AST/transformer/hparams/train_w2v2_nllb_st.yaml here - 8.70
recipes/IWSLT22_lowresource/AST/transformer/hparams/train_samu_mbart_st.yaml here - 10.28
recipes/IWSLT22_lowresource/AST/transformer/hparams/train_samu_nllb_st.yaml here - 11.32

KsponSpeech Dataset

ASR

Model Checkpoints HuggingFace clean-WER others-WER
recipes/KsponSpeech/ASR/transformer/hparams/conformer_medium.yaml here here 20.78% 25.73%

LibriMix Dataset

Separation

Model Checkpoints HuggingFace SI-SNR
recipes/LibriMix/separation/hparams/sepformer-libri2mix.yaml here - 20.4dB
recipes/LibriMix/separation/hparams/sepformer-libri3mix.yaml here - 19.0dB

LibriParty Dataset

VAD

Model Checkpoints HuggingFace Test-Precision Recall F-Score
recipes/LibriParty/VAD/hparams/train.yaml here here 0.9518 0.9437 0.9477

LibriSpeech Dataset

G2P

Model Checkpoints HuggingFace PER-Test
recipes/LibriSpeech/G2P/hparams/hparams_g2p_rnn.yaml here - 2.72%
recipes/LibriSpeech/G2P/hparams/hparams_g2p_transformer.yaml here here 2.89%

ASR-Transducers

Model Checkpoints HuggingFace Test_clean-WER Test_other-WER
recipes/LibriSpeech/ASR/transducer/hparams/conformer_transducer.yaml here - 2.72% 6.47%

ASR-Seq2Seq

Model Checkpoints HuggingFace Test_clean-WER Test_other-WER
recipes/LibriSpeech/ASR/seq2seq/hparams/train_BPE_5000.yaml here here 2.89% 8.09%

ASR-CTC

Model Checkpoints HuggingFace Test_clean-WER Test_other-WER
recipes/LibriSpeech/ASR/CTC/hparams/train_hf_wav2vec.yaml here here 1.65% 3.67%
recipes/LibriSpeech/ASR/CTC/hparams/train_hf_wav2vec_transformer_rescoring.yaml here - 1.57% 3.37%

ASR-Transformers

Model Checkpoints HuggingFace Test_clean-WER Test_other-WER
recipes/LibriSpeech/ASR/transformer/hparams/conformer_small.yaml here here 2.49% 6.10%
recipes/LibriSpeech/ASR/transformer/hparams/transformer.yaml here here 2.27% 5.53%
recipes/LibriSpeech/ASR/transformer/hparams/conformer_large.yaml here - 2.01% 4.52%
recipes/LibriSpeech/ASR/transformer/hparams/branchformer_large.yaml here - 2.04% 4.12%
recipes/LibriSpeech/ASR/transformer/hparams/hyperconformer_22M.yaml here - 2.23% 4.54%
recipes/LibriSpeech/ASR/transformer/hparams/hyperconformer_8M.yaml here - 2.55% 6.61%
recipes/LibriSpeech/ASR/transformer/hparams/hyperbranchformer_25M.yaml - - 2.36% 6.89%
recipes/LibriSpeech/ASR/transformer/hparams/hyperbranchformer_13M.yaml - - 2.54% 6.58%
recipes/LibriSpeech/ASR/transformer/hparams/train_hf_whisper.yaml - -
recipes/LibriSpeech/ASR/transformer/hparams/bayesspeech.yaml here - 2.84% 6.27%

MEDIA Dataset

SLU

Model Checkpoints HuggingFace Test-ChER Test-CER Test-CVER
recipes/MEDIA/SLU/CTC/hparams/train_hf_wav2vec_full.yaml - here 7.46% 20.10% 31.41%
recipes/MEDIA/SLU/CTC/hparams/train_hf_wav2vec_relax.yaml - here 7.78% 24.88% 35.77%

ASR

Model Checkpoints HuggingFace Test-ChER Test-CER
recipes/MEDIA/ASR/CTC/hparams/train_hf_wav2vec.yaml - here 7.78% 4.78%

MultiWOZ Dataset

Response-Generation

Model Checkpoints HuggingFace Test-PPL Test_BLEU-4
recipes/MultiWOZ/response_generation/gpt/hparams/train_gpt.yaml here here 4.01 2.54e-04
recipes/MultiWOZ/response_generation/llama2/hparams/train_llama2.yaml here here 2.90 7.45e-04

REAL-M Dataset

Sisnr-estimation

Model Checkpoints HuggingFace L1-Error
recipes/REAL-M/sisnr-estimation/hparams/pool_sisnrestimator.yaml here here 1.71dB

RescueSpeech Dataset

ASR+enhancement

Model Checkpoints HuggingFace SISNRi SDRi PESQ STOI WER
recipes/RescueSpeech/ASR/noise-robust/hparams/robust_asr_16k.yaml here here 7.482 8.011 2.083 0.854 45.29%

SLURP Dataset

SLU

Model Checkpoints HuggingFace scenario-accuracy action-accuracy intent-accuracy
recipes/SLURP/NLU/hparams/train.yaml here - 90.81% 88.29% 87.28%
recipes/SLURP/direct/hparams/train.yaml here - 81.73% 77.11% 75.05%
recipes/SLURP/direct/hparams/train_with_wav2vec2.yaml here here 91.24% 88.47% 87.55%

Switchboard Dataset

ASR

Model Checkpoints HuggingFace Swbd-WER Callhome-WER Eval2000-WER
recipes/Switchboard/ASR/CTC/hparams/train_with_wav2vec.yaml - here 8.76% 14.67% 11.78%
recipes/Switchboard/ASR/seq2seq/hparams/train_BPE_2000.yaml - here 16.90% 25.12% 20.71%
recipes/Switchboard/ASR/transformer/hparams/transformer.yaml - here 9.80% 17.89% 13.94%

TIMIT Dataset

ASR

Model Checkpoints HuggingFace Test-PER
recipes/TIMIT/ASR/CTC/hparams/train.yaml here - 14.78%
recipes/TIMIT/ASR/seq2seq/hparams/train.yaml here - 14.07%
recipes/TIMIT/ASR/seq2seq/hparams/train_with_wav2vec2.yaml here - 8.04%
recipes/TIMIT/ASR/transducer/hparams/train.yaml here - 14.12%
recipes/TIMIT/ASR/transducer/hparams/train_wav2vec.yaml here - 8.91%

Tedlium2 Dataset

ASR

Model Checkpoints HuggingFace Test-WER_No_LM
recipes/Tedlium2/ASR/transformer/hparams/branchformer_large.yaml here here 8.11%

UrbanSound8k Dataset

SoundClassification

Model Checkpoints HuggingFace Accuracy
recipes/UrbanSound8k/SoundClassification/hparams/train_ecapa_tdnn.yaml here here 75.4%

Voicebank Dataset

Enhancement

Model Checkpoints HuggingFace PESQ
recipes/Voicebank/enhance/MetricGAN/hparams/train.yaml here here 3.15
recipes/Voicebank/enhance/SEGAN/hparams/train.yaml here - 2.38
recipes/Voicebank/enhance/spectral_mask/hparams/train.yaml here - 2.65

ASR+enhancement

Model Checkpoints HuggingFace PESQ COVL test-WER
recipes/Voicebank/MTL/ASR_enhance/hparams/robust_asr.yaml here here 3.05 3.74 2.80

Dereverberation

Model Checkpoints HuggingFace PESQ
recipes/Voicebank/dereverb/MetricGAN-U/hparams/train_dereverb.yaml here - 2.07
recipes/Voicebank/dereverb/spectral_mask/hparams/train.yaml here - 2.35

ASR

Model Checkpoints HuggingFace Test-PER
recipes/Voicebank/ASR/CTC/hparams/train.yaml here - 10.12%

VoxCeleb Dataset

Speaker_recognition

Model Checkpoints HuggingFace EER
recipes/VoxCeleb/SpeakerRec/hparams/train_ecapa_tdnn.yaml here here 0.80%
recipes/VoxCeleb/SpeakerRec/hparams/train_x_vectors.yaml here here 3.23%
recipes/VoxCeleb/SpeakerRec/hparams/train_resnet.yaml here here 0.95%

VoxLingua107 Dataset

Language-id

Model Checkpoints HuggingFace Accuracy
recipes/VoxLingua107/lang_id/hparams/train_ecapa.yaml here here 93.3%

WHAMandWHAMR Dataset

Enhancement

Model Checkpoints HuggingFace SI-SNR PESQ
recipes/WHAMandWHAMR/enhancement/hparams/sepformer-wham.yaml here here 14.4 3.05
recipes/WHAMandWHAMR/enhancement/hparams/sepformer-whamr.yaml here here 10.6 2.84

Separation

Model Checkpoints HuggingFace SI-SNR
recipes/WHAMandWHAMR/separation/hparams/sepformer-wham.yaml here here 16.5
recipes/WHAMandWHAMR/separation/hparams/sepformer-whamr.yaml here here 14.0

WSJ0Mix Dataset

Separation (2mix)

Model Checkpoints HuggingFace SI-SNRi
recipes/WSJ0Mix/separation/hparams/convtasnet.yaml here - 14.8dB
recipes/WSJ0Mix/separation/hparams/dprnn.yaml here - 18.5dB
recipes/WSJ0Mix/separation/hparams/resepformer.yaml here here 18.6dB
recipes/WSJ0Mix/separation/hparams/sepformer.yaml here here 22.4dB
recipes/WSJ0Mix/separation/hparams/skim.yaml here here 18.1dB

ZaionEmotionDataset Dataset

Emotion_Diarization

Model Checkpoints HuggingFace EDER
recipes/ZaionEmotionDataset/emotion_diarization/hparams/train.yaml here here 30.2%

fluent-speech-commands Dataset

SLU

Model Checkpoints HuggingFace Test-accuracy
recipes/fluent-speech-commands/direct/hparams/train.yaml here - 99.60%

timers-and-such Dataset

SLU

Model Checkpoints HuggingFace Accuracy-Test_real
recipes/timers-and-such/decoupled/hparams/train_TAS_LM.yaml here - 46.8%
recipes/timers-and-such/direct/hparams/train.yaml here here 77.5%
recipes/timers-and-such/direct/hparams/train_with_wav2vec2.yaml here - 94.0%
recipes/timers-and-such/multistage/hparams/train_TAS_LM.yaml here - 72.6%