Skip to content

Yorùbá language audio for ASR, TTS and other speech tasks

License

Notifications You must be signed in to change notification settings

Niger-Volta-LTI/yoruba-audio

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 

Repository files navigation

Yorùbá Audio

This repo aggregates audio/speech corpora for Yorùbá tasks, similarly to the yoruba-text for text datasets. The corpora may contain aligned text or be purely unlabeled.

The objective is to have a bird's eye view of available Yorùbá audio, and it's metadata and entropy, to inform additional data collection tasks & modeling. For example, if we see a large Broadcast news corpus, we might be interested to train a self-supervised model on a pretext task to generate speech embeddings for use in ASR/TTS work.

Corpora

Name Size in HH:MM:SS Transcribed Segmented in utterances Aligned Source
Lagos-NWU 02:45:17 ✔️ ✔️ ✔️ North-West University
OpenSLR86 04:1:31 ✔️ ✔️ ✔️ OpenSLR, Google
Bíbélì Mímọ́ (NIV) 93:38:15 ✔️ Biblica Open Bible
Bíbélì Mímọ́ (KJV) ✔️ Bible.is
Colloquial Yorùbá 02:32:29 ✔️ Audio files, Textbook
OrisunTV Broadcast News 81:49:29 Youtube
VoxLingua107 94:2:45 ✔️ post-filtered from Youtube

About

Yorùbá language audio for ASR, TTS and other speech tasks

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published