From 5f90db2650daa49eaa735507ad5352389388c52f Mon Sep 17 00:00:00 2001 From: Yuan-Man <68322456+Yuan-ManX@users.noreply.github.com> Date: Sun, 14 Jul 2024 17:38:25 +0800 Subject: [PATCH] Update index.md --- index.md | 1 + 1 file changed, 1 insertion(+) diff --git a/index.md b/index.md index 92b7fd5..b75b344 100644 --- a/index.md +++ b/index.md @@ -55,6 +55,7 @@ AI Audio Datasets (AI-ADS) 🎵, including Speech, Music, and Sound Effects, whi * [MELD (Multimodal EmotionLines Dataset)](https://affective-meld.github.io/) - Multimodal EmotionLines Dataset (MELD) has been created by enhancing and extending EmotionLines dataset. MELD contains the same dialogue instances available in EmotionLines, but it also encompasses audio and visual modality along with text. MELD has more than 1400 dialogues and 13000 utterances from Friends TV series. Multiple speakers participated in the dialogues. Each utterance in a dialogue has been labeled by any of these seven emotions -- Anger, Disgust, Sadness, Joy, Neutral, Surprise and Fear. MELD also has sentiment (positive, negative and neutral) annotation for each utterance. * [Microsoft Speech Corpus (Indian languages)](https://msropendata.com/datasets/7230b4b1-912d-400e-be58-f84e0512985e) - Microsoft Speech Corpus (Indian languages) release contains conversational and phrasal speech training and test data for Telugu, Tamil and Gujarati languages. The data package includes audio and corresponding transcripts. Data provided in this dataset shall not be used for commercial purposes. You may use the data solely for research purposes. If you publish your findings, you must provide the following attribution: “Data provided by Microsoft and SpeechOcean.com”. * [PATS (Pose Audio Transcript Style)](https://chahuja.com/pats/) - PATS dataset consists of a diverse and large amount of aligned pose, audio and transcripts. With this dataset, we hope to provide a benchmark that would help develop technologies for virtual agents which generate natural and relevant gestures. +* [RealMAN](https://github.com/Audio-WestlakeU/RealMAN) - RealMAN: A Real-Recorded and Annotated Microphone Array Dataset for Dynamic Speech Enhancement and Localization. * [SAVEE (Surrey Audio-Visual Expressed Emotion)](http://kahlan.eps.surrey.ac.uk/savee/) - The Surrey Audio-Visual Expressed Emotion (SAVEE) dataset was recorded as a pre-requisite for the development of an automatic emotion recognition system. The database consists of recordings from 4 male actors in 7 different emotions, 480 British English utterances in total. The sentences were chosen from the standard TIMIT corpus and phonetically-balanced for each emotion. * [SoS_Dataset](https://github.com/Sosdatasets/SoS_Dataset) - Sound of Story: Multi-modal Storytelling with Audio. Storytelling is multi-modal in the real world. When one tells a story, one may use all of the visualizations and sounds along with the story itself. However, prior studies on storytelling datasets and tasks have paid little attention to sound even though sound also conveys meaningful semantics of the story. Therefore, we propose to extend story understanding and telling areas by establishing a new component called "background sound" which is story context-based audio without any linguistic information. * [Speech Datasets Collection](https://github.com/RevoSpeechTech/speech-datasets-collection) - This is a curated list of open speech datasets for speech-related research (mainly for Automatic Speech Recognition). Over 110 speech datasets are collected in this repository, and more than 70 datasets can be downloaded directly without further application or registration.