This repository contains the PyTorch code associated to the paper Investigating Design Choices in Joint-Embedding Predictive Architectures for General Audio Representation Learning, presented at the SASB workshop at ICASSP 2024.
-
Clone the repository and install the requirements using the provided
requirements.txt
orenvironment.yml
. -
Then, preprocess your dataset to convert audios into mel-spectrograms:
python wav_to_lms.py /your/local/audioset /your/local/audioset_lms
-
Write the list of files to use as training data in a csv file
cd data echo file_name > files_audioset.csv find /your/local/audioset_lms -name "*.npy" >> files_audioset.csv
-
You can now start training! We rely on Dora for experiment scheduling. For start an experiment locally, just type:
dora run
Under the hood, Hydra is used for handle configurations, so you can override configurations via CLI or build your own YAML config files. For example, type:
dora run data=my_dataset model.encoder.embed_dim=1024
to train our model with a larger encoder on your custom dataset.
Moreover, you can seamlessly launch SLURM jobs on a cluster thanks to Dora:
dora launch -p partition-a100 -g 4 data=my_dataset
We refer to the respective documentations of Hydra and Dora for more advanced usage.
Our model is evaluated on 8 various downstream tasks, including environmental, speech and music classification ones. Please refer to our paper for additional details.
Will be available soon...
- This great Lightning+Hydra template
- EVAR for evaluating our representations