This is our solution for this year's competition that took part on Kaggle. The task was about training chatbot on the famous Fluent Speech Commands Dataset, so that it is able to accurately map spectrograms to actual command categories. Please find the corresponding notebook here.
We used a pre-trained model,
which we shared in the Discussion section with other participants.
We mapped each 3-categories combo(action-object-location) to a single one
base on the intents.csv
file provided in the competition and used
the most frequent category for unknown combos. As a result, we achieved a 0.95 accuracy,
the highest amongst all teams.
First off, fork or download and unzip the code under a Unix-like environment.
Then download the dataset from
here
and move it to the root directory of the repository.
Then, install soundfile
running
/home/eestech $ sudo apt-get install libsndfile1
on Linux
/home/eestech $ pip3 install soundfile
on MACOSX
Then, from the repository root, run the following command
/home/eestech $ bash run.sh
This will install the rest of dependencies,
download the dataset, labels and pre-trained model from
here.
It will take some time.
Make sure you have at least 3GB of available memory.
After that, the model will predict the labels for all samples
under eestech/input/test.csv
. This should last roughly 8 minutes.
Lastly, the results will be compared to the ground truth(eestech/working/output/1.csv
),
using accuracy as the metric of interest.
If you choose to manually download all data needed, then you can run the python script as follows:
/home/eestech $ cd working
/home/eestech/working $ python3 predict.py
The pre-trained model can be found here. Make sure to also check the corresponding papers of the creators.
- Loren Lugosch, Mirco Ravanelli, Patrick Ignoto, Vikrant Singh Tomar, and Yoshua Bengio, "Speech Model Pre-training for End-to-End Spoken Language Understanding", Interspeech 2019.
- Loren Lugosch, Brett Meyer, Derek Nowrouzezahrai, and Mirco Ravanelli, "Using Speech Synthesis to Train End-to-End Spoken Language Understanding Models", ICASSP 2020.
The script will automatically download from here the following
A folder containing 10 sub-folders, one for each speaker containing a .wav file for each command.
The .wav file paths to train, test and validate on.
Binary files keeping the state of the pre-trained model serialized.