Skip to content

Commit

Permalink
Update index.md
Browse files Browse the repository at this point in the history
  • Loading branch information
Yuan-ManX authored Jul 14, 2024
1 parent ff9072d commit 5f90db2
Showing 1 changed file with 1 addition and 0 deletions.
1 change: 1 addition & 0 deletions index.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,7 @@ AI Audio Datasets (AI-ADS) 🎵, including Speech, Music, and Sound Effects, whi
* [MELD (Multimodal EmotionLines Dataset)](https://affective-meld.github.io/) - Multimodal EmotionLines Dataset (MELD) has been created by enhancing and extending EmotionLines dataset. MELD contains the same dialogue instances available in EmotionLines, but it also encompasses audio and visual modality along with text. MELD has more than 1400 dialogues and 13000 utterances from Friends TV series. Multiple speakers participated in the dialogues. Each utterance in a dialogue has been labeled by any of these seven emotions -- Anger, Disgust, Sadness, Joy, Neutral, Surprise and Fear. MELD also has sentiment (positive, negative and neutral) annotation for each utterance.
* [Microsoft Speech Corpus (Indian languages)](https://msropendata.com/datasets/7230b4b1-912d-400e-be58-f84e0512985e) - Microsoft Speech Corpus (Indian languages) release contains conversational and phrasal speech training and test data for Telugu, Tamil and Gujarati languages. The data package includes audio and corresponding transcripts. Data provided in this dataset shall not be used for commercial purposes. You may use the data solely for research purposes. If you publish your findings, you must provide the following attribution: “Data provided by Microsoft and SpeechOcean.com”.
* [PATS (Pose Audio Transcript Style)](https://chahuja.com/pats/) - PATS dataset consists of a diverse and large amount of aligned pose, audio and transcripts. With this dataset, we hope to provide a benchmark that would help develop technologies for virtual agents which generate natural and relevant gestures.
* [RealMAN](https://github.com/Audio-WestlakeU/RealMAN) - RealMAN: A Real-Recorded and Annotated Microphone Array Dataset for Dynamic Speech Enhancement and Localization.
* [SAVEE (Surrey Audio-Visual Expressed Emotion)](http://kahlan.eps.surrey.ac.uk/savee/) - The Surrey Audio-Visual Expressed Emotion (SAVEE) dataset was recorded as a pre-requisite for the development of an automatic emotion recognition system. The database consists of recordings from 4 male actors in 7 different emotions, 480 British English utterances in total. The sentences were chosen from the standard TIMIT corpus and phonetically-balanced for each emotion.
* [SoS_Dataset](https://github.com/Sosdatasets/SoS_Dataset) - Sound of Story: Multi-modal Storytelling with Audio. Storytelling is multi-modal in the real world. When one tells a story, one may use all of the visualizations and sounds along with the story itself. However, prior studies on storytelling datasets and tasks have paid little attention to sound even though sound also conveys meaningful semantics of the story. Therefore, we propose to extend story understanding and telling areas by establishing a new component called "background sound" which is story context-based audio without any linguistic information.
* [Speech Datasets Collection](https://github.com/RevoSpeechTech/speech-datasets-collection) - This is a curated list of open speech datasets for speech-related research (mainly for Automatic Speech Recognition). Over 110 speech datasets are collected in this repository, and more than 70 datasets can be downloaded directly without further application or registration.
Expand Down

0 comments on commit 5f90db2

Please sign in to comment.