Initial release containing 1200 test set clips and 6000 training set clips with labels for the digit, speaker id, speed, language code, gender, and other metadata.
Release has raw 16khz wav files in the .zip, or preprocessed 8 or 32 mel-bin spaced spectrograms split into training and test for convenience. These files have the features encoded as float16
to save space and documented more in the readme.