SpeechBrain Performance Report

This document provides an overview of the performance achieved on key datasets and tasks supported by SpeechBrain.

AISHELL-1 Dataset

ASR

Model	Checkpoints	HuggingFace	Test-CER
recipes/AISHELL-1/ASR/CTC/hparams/train_with_wav2vec.yaml	here	here	5.06
recipes/AISHELL-1/ASR/seq2seq/hparams/train.yaml	here	-	7.51
recipes/AISHELL-1/ASR/transformer/hparams/train_ASR_transformer.yaml	here	here	6.04
recipes/AISHELL-1/ASR/transformer/hparams/train_ASR_transformer_with_wav2vect.yaml	here	here	5.58

Aishell1Mix Dataset

Separation

Model	Checkpoints	HuggingFace	SI-SNRi
recipes/Aishell1Mix/separation/hparams/sepformer-aishell1mix2.yaml	here	-	13.4dB
recipes/Aishell1Mix/separation/hparams/sepformer-aishell1mix3.yaml	here	-	11.2dB

BinauralWSJ0Mix Dataset

Separation

Model	Checkpoints	HuggingFace	SI-SNRi
recipes/BinauralWSJ0Mix/separation/hparams/convtasnet-cross.yaml	here	-	12.39dB
recipes/BinauralWSJ0Mix/separation/hparams/convtasnet-independent.yaml	here	-	11.90dB
recipes/BinauralWSJ0Mix/separation/hparams/convtasnet-parallel-noise.yaml	here	-	18.25dB
recipes/BinauralWSJ0Mix/separation/hparams/convtasnet-parallel-reverb.yaml	here	-	6.95dB
recipes/BinauralWSJ0Mix/separation/hparams/convtasnet-parallel.yaml	here	-	16.93dB

CVSS Dataset

S2ST

Model	Checkpoints	HuggingFace	Test-sacrebleu
recipes/CVSS/S2ST/hparams/train_fr-en.yaml	here	here	24.47

CommonLanguage Dataset

Language-id

Model	Checkpoints	HuggingFace	Error
recipes/CommonLanguage/lang_id/hparams/train_ecapa_tdnn.yaml	here	here	15.1%

CommonVoice Dataset

ASR-transducer

Model	Checkpoints	HuggingFace	Test-WER
recipes/CommonVoice/ASR/transducer/hparams/train_fr.yaml	here	here	17.58%
recipes/CommonVoice/ASR/transducer/hparams/train_it.yaml	here	here	14.88%
recipes/CommonVoice/ASR/transducer/hparams/train_de.yaml	here	here	15.25%

ASR-transformer

Model	Checkpoints	HuggingFace	Test-WER
recipes/CommonVoice/ASR/transformer/hparams/train_fr.yaml	here	-	17.61%
recipes/CommonVoice/ASR/transformer/hparams/train_it.yaml	here	-	16.80%
recipes/CommonVoice/ASR/transformer/hparams/train_de.yaml	here	-	16.76%
recipes/CommonVoice/ASR/transformer/hparams/train_ar_hf_whisper.yaml	here	here	16.96%
recipes/CommonVoice/ASR/transformer/hparams/train_fa_hf_whisper.yaml	here	here	31.75%
recipes/CommonVoice/ASR/transformer/hparams/train_fr_hf_whisper.yaml	here	here	10.62%
recipes/CommonVoice/ASR/transformer/hparams/train_sr_hf_whisper.yaml	here	here	22.29%
recipes/CommonVoice/ASR/transformer/hparams/train_mn_hf_whisper.yaml	here	here	67.84%
recipes/CommonVoice/ASR/transformer/hparams/train_hi_hf_whisper.yaml	here	here	15.27%
recipes/CommonVoice/ASR/transformer/hparams/train_it_hf_whisper.yaml	here	here	9.63%

ASR-CTC

Model	Checkpoints	HuggingFace	Test-WER
recipes/CommonVoice/ASR/CTC/hparams/train_en_with_wav2vec.yaml	here	here	16.16%
recipes/CommonVoice/ASR/CTC/hparams/train_fr_with_wav2vec.yaml	here	here	9.71%
recipes/CommonVoice/ASR/CTC/hparams/train_it_with_wav2vec.yaml	here	here	7.99%
recipes/CommonVoice/ASR/CTC/hparams/train_rw_with_wav2vec.yaml	here	here	22.52%
recipes/CommonVoice/ASR/CTC/hparams/train_de_with_wav2vec.yaml	here	here	8.39%
recipes/CommonVoice/ASR/CTC/hparams/train_ar_with_wav2vec.yaml	here	here	28.53%
recipes/CommonVoice/ASR/CTC/hparams/train_es_with_wav2vec.yaml	here	here	12.67%
recipes/CommonVoice/ASR/CTC/hparams/train_pt_with_wav2vec.yaml	here	here	21.69%
recipes/CommonVoice/ASR/CTC/hparams/train_zh-CN_with_wav2vec.yaml	here	here	23.17%

ASR-seq2seq

Model	Checkpoints	HuggingFace	Test-WER
recipes/CommonVoice/ASR/seq2seq/hparams/train_de.yaml	here	here	12.25%
recipes/CommonVoice/ASR/seq2seq/hparams/train_en.yaml	here	here	23.88%
recipes/CommonVoice/ASR/seq2seq/hparams/train_fr.yaml	here	here	14.88%
recipes/CommonVoice/ASR/seq2seq/hparams/train_it.yaml	here	here	17.02%
recipes/CommonVoice/ASR/seq2seq/hparams/train_rw.yaml	here	here	29.22%
recipes/CommonVoice/ASR/seq2seq/hparams/train_es.yaml	here	here	14.77%

DNS Dataset

Enhancement

Model	Checkpoints	HuggingFace	valid-PESQ	test-SIG	test-BAK	test-OVRL
recipes/DNS/enhancement/hparams/sepformer-dns-16k.yaml	here	here	2.06	2.999	3.076	2.437

DVoice Dataset

ASR-CTC

Model	Checkpoints	HuggingFace	Test-WER
recipes/DVoice/ASR/CTC/hparams/train_amh_with_wav2vec.yaml	here	here	24.92%
recipes/DVoice/ASR/CTC/hparams/train_dar_with_wav2vec.yaml	here	here	18.28%
recipes/DVoice/ASR/CTC/hparams/train_fon_with_wav2vec.yaml	here	here	9.00%
recipes/DVoice/ASR/CTC/hparams/train_sw_with_wav2vec.yaml	here	here	23.16%
recipes/DVoice/ASR/CTC/hparams/train_wol_with_wav2vec.yaml	here	here	16.05%

Multilingual-ASR-CTC

Model	Checkpoints	HuggingFace	WER-Darija	WER-Swahili	WER-Fongbe	Fongbe-Wolof	WER-Amharic
recipes/DVoice/ASR/CTC/hparams/train_multi_with_wav2vec.yaml	here	-	13.27%	29.31%	10.26%	21.54%	31.15%

ESC50 Dataset

SoundClassification

Model	Checkpoints	HuggingFace	Accuracy
recipes/ESC50/classification/hparams/cnn14_classifier.yaml	here	-	82%
recipes/ESC50/classification/hparams/conv2d_classifier.yaml	here	-	75%

Fisher-Callhome-Spanish Dataset

Speech_Translation

Model	Checkpoints	HuggingFace	Test-sacrebleu
recipes/Fisher-Callhome-Spanish/ST/transformer/hparams/transformer.yaml	here	-	47.31
recipes/Fisher-Callhome-Spanish/ST/transformer/hparams/conformer.yaml	here	-	48.04

Google-speech-commands Dataset

Command_recognition

Model	Checkpoints	HuggingFace	Test-accuracy
recipes/Google-speech-commands/hparams/xvect.yaml	here	here	97.43%
recipes/Google-speech-commands/hparams/xvect_leaf.yaml	here	-	96.79%

IEMOCAP Dataset

Emotion_recognition

Model	Checkpoints	HuggingFace	Test-Accuracy
recipes/IEMOCAP/emotion_recognition/hparams/train_with_wav2vec2.yaml	here	here	65.7%
recipes/IEMOCAP/emotion_recognition/hparams/train.yaml	here	-	77.0%

IWSLT22_lowresource Dataset

Speech_Translation

Model	Checkpoints	HuggingFace	Test-BLEU
recipes/IWSLT22_lowresource/AST/transformer/hparams/train_w2v2_mbart_st.yaml	here	-	7.73
recipes/IWSLT22_lowresource/AST/transformer/hparams/train_w2v2_nllb_st.yaml	here	-	8.70
recipes/IWSLT22_lowresource/AST/transformer/hparams/train_samu_mbart_st.yaml	here	-	10.28
recipes/IWSLT22_lowresource/AST/transformer/hparams/train_samu_nllb_st.yaml	here	-	11.32

KsponSpeech Dataset

ASR

Model	Checkpoints	HuggingFace	clean-WER	others-WER
recipes/KsponSpeech/ASR/transformer/hparams/conformer_medium.yaml	here	here	20.78%	25.73%

LibriMix Dataset

Separation

Model	Checkpoints	HuggingFace	SI-SNR
recipes/LibriMix/separation/hparams/sepformer-libri2mix.yaml	here	-	20.4dB
recipes/LibriMix/separation/hparams/sepformer-libri3mix.yaml	here	-	19.0dB

LibriParty Dataset

VAD

Model	Checkpoints	HuggingFace	Test-Precision	Recall	F-Score
recipes/LibriParty/VAD/hparams/train.yaml	here	here	0.9518	0.9437	0.9477

LibriSpeech Dataset

G2P

Model	Checkpoints	HuggingFace	PER-Test
recipes/LibriSpeech/G2P/hparams/hparams_g2p_rnn.yaml	here	-	2.72%
recipes/LibriSpeech/G2P/hparams/hparams_g2p_transformer.yaml	here	here	2.89%

ASR-Transducers

Model	Checkpoints	HuggingFace	Test_clean-WER	Test_other-WER
recipes/LibriSpeech/ASR/transducer/hparams/conformer_transducer.yaml	here	-	2.72%	6.47%

ASR-Seq2Seq

Model	Checkpoints	HuggingFace	Test_clean-WER	Test_other-WER
recipes/LibriSpeech/ASR/seq2seq/hparams/train_BPE_5000.yaml	here	here	2.89%	8.09%

ASR-CTC

Model	Checkpoints	HuggingFace	Test_clean-WER	Test_other-WER
recipes/LibriSpeech/ASR/CTC/hparams/train_hf_wav2vec.yaml	here	here	1.65%	3.67%
recipes/LibriSpeech/ASR/CTC/hparams/train_hf_wav2vec_transformer_rescoring.yaml	here	-	1.57%	3.37%

ASR-Transformers

Model	Checkpoints	HuggingFace	Test_clean-WER	Test_other-WER
recipes/LibriSpeech/ASR/transformer/hparams/conformer_small.yaml	here	here	2.49%	6.10%
recipes/LibriSpeech/ASR/transformer/hparams/transformer.yaml	here	here	2.27%	5.53%
recipes/LibriSpeech/ASR/transformer/hparams/conformer_large.yaml	here	-	2.01%	4.52%
recipes/LibriSpeech/ASR/transformer/hparams/branchformer_large.yaml	here	-	2.04%	4.12%
recipes/LibriSpeech/ASR/transformer/hparams/hyperconformer_22M.yaml	here	-	2.23%	4.54%
recipes/LibriSpeech/ASR/transformer/hparams/hyperconformer_8M.yaml	here	-	2.55%	6.61%
recipes/LibriSpeech/ASR/transformer/hparams/hyperbranchformer_25M.yaml	-	-	2.36%	6.89%
recipes/LibriSpeech/ASR/transformer/hparams/hyperbranchformer_13M.yaml	-	-	2.54%	6.58%
recipes/LibriSpeech/ASR/transformer/hparams/train_hf_whisper.yaml	-	-
recipes/LibriSpeech/ASR/transformer/hparams/bayesspeech.yaml	here	-	2.84%	6.27%

MEDIA Dataset

SLU

Model	Checkpoints	HuggingFace	Test-ChER	Test-CER	Test-CVER
recipes/MEDIA/SLU/CTC/hparams/train_hf_wav2vec_full.yaml	-	here	7.46%	20.10%	31.41%
recipes/MEDIA/SLU/CTC/hparams/train_hf_wav2vec_relax.yaml	-	here	7.78%	24.88%	35.77%

ASR

Model	Checkpoints	HuggingFace	Test-ChER	Test-CER
recipes/MEDIA/ASR/CTC/hparams/train_hf_wav2vec.yaml	-	here	7.78%	4.78%

MultiWOZ Dataset

Response-Generation

Model	Checkpoints	HuggingFace	Test-PPL	Test_BLEU-4
recipes/MultiWOZ/response_generation/gpt/hparams/train_gpt.yaml	here	here	4.01	2.54e-04
recipes/MultiWOZ/response_generation/llama2/hparams/train_llama2.yaml	here	here	2.90	7.45e-04

REAL-M Dataset

Sisnr-estimation

Model	Checkpoints	HuggingFace	L1-Error
recipes/REAL-M/sisnr-estimation/hparams/pool_sisnrestimator.yaml	here	here	1.71dB

RescueSpeech Dataset

ASR+enhancement

Model	Checkpoints	HuggingFace	SISNRi	SDRi	PESQ	STOI	WER
recipes/RescueSpeech/ASR/noise-robust/hparams/robust_asr_16k.yaml	here	here	7.482	8.011	2.083	0.854	45.29%

SLURP Dataset

SLU

Model	Checkpoints	HuggingFace	scenario-accuracy	action-accuracy	intent-accuracy
recipes/SLURP/NLU/hparams/train.yaml	here	-	90.81%	88.29%	87.28%
recipes/SLURP/direct/hparams/train.yaml	here	-	81.73%	77.11%	75.05%
recipes/SLURP/direct/hparams/train_with_wav2vec2.yaml	here	here	91.24%	88.47%	87.55%

Switchboard Dataset

ASR

Model	Checkpoints	HuggingFace	Swbd-WER	Callhome-WER	Eval2000-WER
recipes/Switchboard/ASR/CTC/hparams/train_with_wav2vec.yaml	-	here	8.76%	14.67%	11.78%
recipes/Switchboard/ASR/seq2seq/hparams/train_BPE_2000.yaml	-	here	16.90%	25.12%	20.71%
recipes/Switchboard/ASR/transformer/hparams/transformer.yaml	-	here	9.80%	17.89%	13.94%

TIMIT Dataset

ASR

Model	Checkpoints	HuggingFace	Test-PER
recipes/TIMIT/ASR/CTC/hparams/train.yaml	here	-	14.78%
recipes/TIMIT/ASR/seq2seq/hparams/train.yaml	here	-	14.07%
recipes/TIMIT/ASR/seq2seq/hparams/train_with_wav2vec2.yaml	here	-	8.04%
recipes/TIMIT/ASR/transducer/hparams/train.yaml	here	-	14.12%
recipes/TIMIT/ASR/transducer/hparams/train_wav2vec.yaml	here	-	8.91%

Tedlium2 Dataset

ASR

Model	Checkpoints	HuggingFace	Test-WER_No_LM
recipes/Tedlium2/ASR/transformer/hparams/branchformer_large.yaml	here	here	8.11%

UrbanSound8k Dataset

SoundClassification

Model	Checkpoints	HuggingFace	Accuracy
recipes/UrbanSound8k/SoundClassification/hparams/train_ecapa_tdnn.yaml	here	here	75.4%

Voicebank Dataset

Enhancement

Model	Checkpoints	HuggingFace	PESQ
recipes/Voicebank/enhance/MetricGAN/hparams/train.yaml	here	here	3.15
recipes/Voicebank/enhance/SEGAN/hparams/train.yaml	here	-	2.38
recipes/Voicebank/enhance/spectral_mask/hparams/train.yaml	here	-	2.65

ASR+enhancement

Model	Checkpoints	HuggingFace	PESQ	COVL	test-WER
recipes/Voicebank/MTL/ASR_enhance/hparams/robust_asr.yaml	here	here	3.05	3.74	2.80

Dereverberation

Model	Checkpoints	HuggingFace	PESQ
recipes/Voicebank/dereverb/MetricGAN-U/hparams/train_dereverb.yaml	here	-	2.07
recipes/Voicebank/dereverb/spectral_mask/hparams/train.yaml	here	-	2.35

ASR

Model	Checkpoints	HuggingFace	Test-PER
recipes/Voicebank/ASR/CTC/hparams/train.yaml	here	-	10.12%

VoxCeleb Dataset

Speaker_recognition

Model	Checkpoints	HuggingFace	EER
recipes/VoxCeleb/SpeakerRec/hparams/train_ecapa_tdnn.yaml	here	here	0.80%
recipes/VoxCeleb/SpeakerRec/hparams/train_x_vectors.yaml	here	here	3.23%
recipes/VoxCeleb/SpeakerRec/hparams/train_resnet.yaml	here	here	0.95%

VoxLingua107 Dataset

Language-id

Model	Checkpoints	HuggingFace	Accuracy
recipes/VoxLingua107/lang_id/hparams/train_ecapa.yaml	here	here	93.3%

WHAMandWHAMR Dataset

Enhancement

Model	Checkpoints	HuggingFace	SI-SNR	PESQ
recipes/WHAMandWHAMR/enhancement/hparams/sepformer-wham.yaml	here	here	14.4	3.05
recipes/WHAMandWHAMR/enhancement/hparams/sepformer-whamr.yaml	here	here	10.6	2.84

Separation

Model	Checkpoints	HuggingFace	SI-SNR
recipes/WHAMandWHAMR/separation/hparams/sepformer-wham.yaml	here	here	16.5
recipes/WHAMandWHAMR/separation/hparams/sepformer-whamr.yaml	here	here	14.0

WSJ0Mix Dataset

Separation (2mix)

Model	Checkpoints	HuggingFace	SI-SNRi
recipes/WSJ0Mix/separation/hparams/convtasnet.yaml	here	-	14.8dB
recipes/WSJ0Mix/separation/hparams/dprnn.yaml	here	-	18.5dB
recipes/WSJ0Mix/separation/hparams/resepformer.yaml	here	here	18.6dB
recipes/WSJ0Mix/separation/hparams/sepformer.yaml	here	here	22.4dB
recipes/WSJ0Mix/separation/hparams/skim.yaml	here	here	18.1dB

ZaionEmotionDataset Dataset

Emotion_Diarization

Model	Checkpoints	HuggingFace	EDER
recipes/ZaionEmotionDataset/emotion_diarization/hparams/train.yaml	here	here	30.2%

fluent-speech-commands Dataset

SLU

Model	Checkpoints	HuggingFace	Test-accuracy
recipes/fluent-speech-commands/direct/hparams/train.yaml	here	-	99.60%

timers-and-such Dataset

SLU

Model	Checkpoints	HuggingFace	Accuracy-Test_real
recipes/timers-and-such/decoupled/hparams/train_TAS_LM.yaml	here	-	46.8%
recipes/timers-and-such/direct/hparams/train.yaml	here	here	77.5%
recipes/timers-and-such/direct/hparams/train_with_wav2vec2.yaml	here	-	94.0%
recipes/timers-and-such/multistage/hparams/train_TAS_LM.yaml	here	-	72.6%

Files

PERFORMANCE.md

Latest commit

History

PERFORMANCE.md

File metadata and controls

SpeechBrain Performance Report

AISHELL-1 Dataset

ASR

Aishell1Mix Dataset

Separation

BinauralWSJ0Mix Dataset

Separation

CVSS Dataset

S2ST

CommonLanguage Dataset

Language-id

CommonVoice Dataset

ASR-transducer

ASR-transformer

ASR-CTC

ASR-seq2seq

DNS Dataset

Enhancement

DVoice Dataset

ASR-CTC

Multilingual-ASR-CTC

ESC50 Dataset

SoundClassification

Fisher-Callhome-Spanish Dataset

Speech_Translation

Google-speech-commands Dataset

Command_recognition

IEMOCAP Dataset

Emotion_recognition

IWSLT22_lowresource Dataset

Speech_Translation

KsponSpeech Dataset

ASR

LibriMix Dataset

Separation

LibriParty Dataset

VAD

LibriSpeech Dataset

G2P

ASR-Transducers

ASR-Seq2Seq

ASR-CTC

ASR-Transformers

MEDIA Dataset

SLU

ASR

MultiWOZ Dataset

Response-Generation

REAL-M Dataset

Sisnr-estimation

RescueSpeech Dataset

ASR+enhancement

SLURP Dataset

SLU

Switchboard Dataset

ASR

TIMIT Dataset

ASR

Tedlium2 Dataset

ASR

UrbanSound8k Dataset

SoundClassification

Voicebank Dataset

Enhancement

ASR+enhancement

Dereverberation

ASR

VoxCeleb Dataset

Speaker_recognition

VoxLingua107 Dataset

Language-id

WHAMandWHAMR Dataset

Enhancement

Separation