Project with full setup for training a deep-learning model for music key classification of a given audio file. The results of the training can be found on Weights & Biases platform: https://wandb.ai/radziminski/audio-key-classification
Run make install
to install all system and python requirements for this project.
make help
- Show helpmake clean
- Clean autogenerated filesmake clean-logs
- Clean logsmake format
- Run pre-commit hooksmake install
- Install all dependenciesmake create-ncs-dataset
- Runs script that scraps the ncs.io site and creates NCS datasetmake prepare-audio
- Download and prepare datasets with mp3 audio filesmake prepare-ncs-audio
- Download and prepare NCS datasetmake prepare-gs-mtg-audio
- Download and prepare GS MTG datasetmake prepare-gs-key-audio
- Download and prepare GS KEY datasetmake prepare-images
- Download images with precomputed spectrograms for each datasetmake create-spectrograms
- Create and save spectrogram images from existing audio files in all datasetsmake prepare-spectrograms
- Downloads audio files, creates and saves spectrograms images for all datasetsmake eval
- Default eval (on song fragments)make eval-full
- Evaluation the model on full songsmake eval-audio
- Eval the model on audio filesmake eval-audio-ncs-only
- Eval the model on audio files and only NCS datasetmake eval-audio-gs_key-only
- Eval the model on audio filesand only GS Key datasetmake eval-images
- Eval the model on spectrograms images filesmake eval-images-ncs-only
- Eval the model with images and only NCS datasetmake eval-images-gs_mtg-only
- Eval the model with images and only GS Key datasetmake train
- Train the modelmake train-from-checkpoint
- Train the model from checkpointmake train-audio
- Train the model with AudioDataModulemake train-audio-ncs-only
- Train the model with audio and only NCS datasetmake train-audio-gs_mtg-only
- Train the model with audio and only GS MTG datasetmake train-images
- Train the model with ImageDataModulemake train-images-ncs-only
- Train the model with images and only NCS datasetmake train-images-gs_mtg-only
- Train the model with images and only GS MTG datasetmake test-default
- Test model by running 1 full epochmake test-overfit
- Test model by running 1 train, val and test loop, using only 1 batchmake test-fdr
- Test model with overfit on 1 batchmake test-limit
- Test model by running train on 1% of datamake experiments-nf
- Run experiments with different number of feature mapsmake experiments-hparams
- Run experiments for hyperparams tuningmake experiments-dataset
- Run experiments with different combinations of datasetsmake debug
- Enter debugging mode with pdb
Model is designed to classify spectrograms created from audio files. It uses three datasets:
- NoCopyrightSounds - original (created by us) dataset based on freely available songs from ncs.io.
- GiantSteps MTG - dataset with over 1000 2-minute parts of song from Beatport service with key corrections from service users
- GiantSteps Key - dataset with around 600 2-minute parts of song from Beatport service with key corrections from service users
There three main "modes" that model can be trained in:
In this mode, model uses raw audio files for training/classification. Audio files are divided into intervals during data preparation. In data-loaders they are loaded into tensors and resampled into 44100 Hz sampling frequency. Then the Constant-Q transform (CQT) is applied to them, creating spectrograms. These spectrograms are the used for model training. This mode gives the biggest flexibility, since the Constant-Q parameters can be adjusted between each training. However, audio load and spectrogram creation takes significant time, resulting in very slow model training.
To use this mode you can run:
make prepare-audio
script for data preparation. It downloads three datasets, converts them to correct folder structure and splits each audio file into intervals.make train-audio
script starts model trainingmake eval-audio
script starts model evaluation
In this mode, datasets with raw audio files are downloaded and then the spectrogram images are created from them in data preparation step (before the actual training). Then the model is trained directly on spectrogram images file. This mode gives little less flexibility, but it significantly speeds up training.
To use this mode you can run:
make prepare-spectrograms
script for data preparation. It downloads three audio datasets and saves spectrogram files from from every song intervals.make train-image
script starts model trainingmake eval-image
script starts model evaluation
This mode is th same as the previous one, except that the pre-generated spectrogram images are downloaded and used for training. In this mode none of the spectrogram parameters can be adjusted, but its execution is fastest from all three modes.
To use this mode you can run:
make prepare-images
script for data preparation. It downloads three datasets with image spectrogram files inside.make train-image
script starts model trainingmake eval-image
script starts model evaluation
To test whether the model trains properly you can use one of tests scripts:
make test-default
- Test model by running 1 full epochmake test-overfit
- Test model by running 1 train, val and test loop, using only 1 batchmake test-fdr
- Test model with overfit on 1 batchmake test-limit
- Test model by running train on 1% of data
You can train the model by running src/train.py
file.
You can train the model by running src/eval.py
file.