The 601 Hours – Minnan Dialect Conversational Speech Data collected by phone involved more than 1,000 native speakers, developed with a proper balance of gender ratio and geographical distribution. Speakers would choose a few familiar topics out of the given list and start conversations to ensure the dialogue's fluency and naturalness. The recording devices are various mobile phones. The audio format is 16kHz, 16bit, uncompressed WAV, and all the speech data was recorded in quiet indoor environments. All the speech audio was manually transcribed with text content, and the start and end timestamps of each effective sentence and speaker identification, including gender, were also annotated. The accuracy rate of sentences is ≥ 95%.
For more details, please refer to the link: https://www.nexdata.ai/datasets/speechrecog/1127?source=Github
16kHz, 16bit, uncompressed wav, mono channel
quiet indoor environment, without echo
dozens of topics are specified, and the speakers make dialogue under those topics while the recording is performed
about 1,000 speakers, balance for gender
annotating for the transcription text, speaker identification and gender
Android mobile phone, iPhone
Minnan Dialect
speech recognition; voiceprint recognition
95%
Commercial License