Skip to content

Latest commit

 

History

History
executable file
·
42 lines (31 loc) · 2.51 KB

README.md

File metadata and controls

executable file
·
42 lines (31 loc) · 2.51 KB

Urban Sound Classification

Project idea

Audio classification using PyTorch.
Comparing custom made FFNN and CNN models to a pre-trained VGG-11 with Batch Normalization.

Paper

Paper (written in Serbian) describing project ideas and implementation can be found here
Presentation (written in Serbian) describing project ideas and implementation can be found here

Setup

Dataset folder should be found in data/dataset/ under the name URBANSOUND8K (download from website below)
Models will be stored in models\saved_models\URBAN_SOUNDS_8K
Results will be stored in data\results\URBAN_SOUNDS_8K

Run main.py
NOTE: You should install PyTorch, TorchAudio, Numpy, sklearn and seaborn

Command Line Parameters

--type (Input TRAIN, TEST, TRAIN_AND_TEST or CUSTOM_TEST for type of classification)
--model_name (Select model to use - FFNN, CNN, VGG)
--show_results (Plot loss and accuracy info, default=False)
--save_results (Plot loss and accuracy info, default=True)
--save_model (Save model during training, default=True)
--custom_test_path (Path for custom audio to classify)

Info can also be found using --help parameter

Network models

FFNN network with 3 hidden layers (definition can be found at models\definitions\ffnn_model.py)
CNN network with VGG-like architecture (definition can be found at models\definitions\cnn_model.py)
Pre-trained VGG-11 with Batch Normalization
Trained models at: https://drive.google.com/drive/folders/1cxllv-qDtqtNUPz3512q5k9uKZ7OAn4W?usp=sharing

Dataset

Download dataset: URBANSOUNDS8K

Taken from website:
10-fold cross validation using the predefined folds: train on data from 9 of the 10 predefined folds and test on data from the remaining fold . Repeat this process 10 times (each time using a different set of 9 out of the 10 folds for training and the remaining fold for testing). Finally report the average classification accuracy over all 10 experiments (as an average score + standard deviation, or, even better, as a boxplot).