Must make dir called model_output
spoken language dataset. Be mindful that it's a 16 GB dataset
You can check this thread of jupyter lab vs. jupyter notebooks
pip3 install jupyterlab
pip3 install notebook
These tools were only used to run .ipynb files and facilitate visualizations!
Python script to jupyter-notebook converter
How did we clean up the files?
ls <dir> | grep -o '.....$' | uniq
<dir> | grep -o '^es.*' # finds the spanish ones
For our work, we used the test set found in local dirs such as
/media/andres/2D2DA2454B8413B5/test/test/
The final version is the file_cleaner script found in this dir. That one copies the Spanish files to a new given dir as its second argument
Tutorial on mel spectograms