Skip to content

Code for "Comparison and Analysis of SampleCNN Architectures for Audio Classification", IEEE Journal of Selected Topics in Signal Processing (JSTSP), 2019.

Notifications You must be signed in to change notification settings

tae-jun/sampleaudio

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SampleCNNs for Audio Classifications

This repository contains the code that used for the publication below:

Taejun Kim, Jongpil Lee, and Juhan Nam, "Comparison and Analysis of SampleCNN Architectures for Audio Classification" in IEEE Journal of Selected Topics in Signal Processing (JSTSP), 2019.

Contents:

  • Install Dependencies
  • Building Datasets
    • Music auto-tagging: MagnaTagATune
    • Keyword spotting: Speech Commands
    • Acoustic scene tagging: DCASE 2017 Task 4
  • Training a SampleCNN

Dependency Installation

NOTE: The code of this repository is written and tested on Python 3.6.

  • tensorflow 1.10.X (strongly recommend to use 1.10.X because of version compatibility)
  • librosa
  • ffmpeg
  • pandas
  • numpy
  • scikit-learn
  • h5py

To install the required python packages using conda, run the command below:

conda install tensorflow-gpu=1.10.0 ffmpeg pandas numpy scikit-learn h5py
conda install -c conda-forge librosa

Building Datasets

Download and preprocess a dataset that you want to train a model on.

Music auto-tagging: MagnaTagATune

Edith Law, Kris West, Michael Mandel, Mert Bay and J. Stephen Downie (2009). Evaluation of algorithms using games: the case of music annotation. In Proceedings of the 10th International Conference on Music Information Retrieval (ISMIR).

Create a directory for the dataset and download required one .csv file and three .zip files in the directory data/mtt/raw:

mkdir -p data/mtt/raw
cd data/mtt/raw
wget http://mi.soi.city.ac.uk/datasets/magnatagatune/annotations_final.csv
wget http://mi.soi.city.ac.uk/datasets/magnatagatune/mp3.zip.001
wget http://mi.soi.city.ac.uk/datasets/magnatagatune/mp3.zip.002
wget http://mi.soi.city.ac.uk/datasets/magnatagatune/mp3.zip.003

After download the files, merge and expand the three .zip files:

cat mp3.zip.* > mp3_all.zip
unzip mp3_all.zip -d mp3

Your directory structure should look like this:

data
└── mtt
    └── raw
        ├── annotations_final.csv
        └── mp3
            ├── 0
            ├── ...
            └── f

Finally, segment and convert audios to TFRecords using following command:

python build_dataset.py mtt

Keyword spotting: Speech Commands

Pete Warden (2018). Speech commands: A dataset for limited-vocabulary speech recognition. arXiv:1804.03209.

After create a directory for the dataset, download and expand the dataset in the directory data/scd/raw:

mkdir -p data/scd/raw
cd data/scd/raw
wget http://download.tensorflow.org/data/speech_commands_v0.02.tar.gz
tar zxvf speech_commands_v0.02.tar.gz

Finally, segment and convert audios to TFRecords using following command:

python build_dataset.py scd

Acoustic scene tagging: DCASE 2017 Task 4

Annamaria Mesaros, Toni Heittola, Aleksandr Diment, Benjamin Elizalde, Ankit Shah, Emmanuel Vincent, Bhiksha Raj and Tuomas Virtanen (2017). DCASE 2017 challenge setup: tasks, datasets and baseline system. In Proceedings of the Detection and Classification of Acoustic Scenes and Events 2017 Workshop (DCASE2017).

mkdir -p data/dcs/raw
cd data/dcs/raw

wget --no-check-certificate -r 'https://docs.google.com/uc?export=download&id=1HOQaUHbTgCRsS6Sr9I9uE6uCjiNPC3d3' -O Task_4_DCASE_2017_training_set.zip
wget --no-check-certificate -r 'https://docs.google.com/uc?export=download&id=1GfP5JATSmCqD8p3CBIkk1J90mfJuPI-k' -O Task_4_DCASE_2017_testing_set.zip
wget https://dl.dropboxusercontent.com/s/bbgqfd47cudwe9y/DCASE_2017_evaluation_set_audio_files.zip

unzip -P DCASE_2017_training_set Task_4_DCASE_2017_training_set.zip
unzip -P DCASE_2017_testing_set Task_4_DCASE_2017_testing_set.zip
unzip -P DCASE_2017_evaluation_set DCASE_2017_evaluation_set_audio_files.zip

wget https://github.com/ankitshah009/Task-4-Large-scale-weakly-supervised-sound-event-detection-for-smart-cars/raw/master/groundtruth_release/groundtruth_weak_label_training_set.csv
wget https://github.com/ankitshah009/Task-4-Large-scale-weakly-supervised-sound-event-detection-for-smart-cars/raw/master/groundtruth_release/groundtruth_weak_label_testing_set.csv
wget https://github.com/ankitshah009/Task-4-Large-scale-weakly-supervised-sound-event-detection-for-smart-cars/raw/master/groundtruth_release/groundtruth_weak_label_evaluation_set.csv

Finally, segment and convert audios to TFRecords using following command:

python build_dataset.py dcs

Training a SampleCNN

You can train a SampleCNN with a block on a dataset that you want. Here are several examples to run training:

# Train a SampleCNN with SE block (default) on MagnaTagATune dataset (music auto-tagging)
python train.py mtt

# Train a SampleCNN with ReSE-2 block on Speech Commands dataset (keyword spotting)
python train.py scd --block rese2

# Train a SampleCNN with basic block on DCASE 2017 Task 4 dataset (acoustic scene tagging
python train.py dcs --block basic

Trained models are saved under log directory with a datetime that you started running. Here is an example of saved model:

log/
    └── 20190424_213449-scd-se/
        └── final-auc_0.XXXXXX-acc_0.XXXXXX-f1_0.XXXXXX.h5

You can see the available options for training using the command below:

$ python train.py -h

usage: train.py [-h] [--data-dir PATH] [--log-dir PATH]
                [--block {basic,se,res1,res2,rese1,rese2}]
                [--amplifying-ratio N] [--multi] [--batch-size N]
                [--momentum M] [--lr LR] [--lr-decay DC] [--dropout DO]
                [--weight-decay WD] [--num-stages N] [--patience N]
                [--num-readers N]
                DATASET [NAME]

Train a SampleCNN.

positional arguments:
  DATASET               Dataset for training: {mtt|scd|dcs}
  NAME                  Name of log directory.

optional arguments:
  -h, --help            show this help message and exit
  --data-dir PATH
  --log-dir PATH        Directory where to write event logs and models.
  --block {basic,se,res1,res2,rese1,rese2}
                        Convolutional block to build a model (default: se,
                        options: basic/se/res1/res2/rese1/rese2).
  --amplifying-ratio N
  --multi               Use multi-level feature aggregation.
  --batch-size N        Mini-batch size.
  --momentum M          Momentum for SGD.
  --lr LR               Learning rate.
  --lr-decay DC         Learning rate decay rate.
  --dropout DO          Dropout rate.
  --weight-decay WD     Weight decay.
  --num-stages N        Number of stages to train.
  --patience N          Stop training stage after #patiences.
  --num-readers N       Number of TFRecord readers.

About

Code for "Comparison and Analysis of SampleCNN Architectures for Audio Classification", IEEE Journal of Selected Topics in Signal Processing (JSTSP), 2019.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages