- Pre-Trained Models now available here
- 24/5/24: Hyperparamater and original results files now available in "Other Files"
- 10/01/23: New MetaAudio datasets released in MT-SLVR Paper. New sets revolve around few-shot speech classification.
- 6/9/22: Presented MetaAudio at ICANN22, slides available in repo
- 01/07/2022: MetaAudio accepted to ICANN22. To be presented in early September 2022.
A new comprehensive and diverse few-shot acoustic classification benchmark. If you use any code or results from results from this work, please cite the following: ICANN22 Link or arXiv Link
@InProceedings{10.1007/978-3-031-15919-0_19,
author="Heggan, Calum
and Budgett, Sam
and Hospedales, Timothy
and Yaghoobi, Mehrdad",
title="MetaAudio: A Few-Shot Audio Classification Benchmark",
booktitle="Artificial Neural Networks and Machine Learning -- ICANN 2022",
year="2022",
publisher="Springer International Publishing",
address="Cham",
pages="219--230",
isbn="978-3-031-15919-0"
}
A new and (hopefully) more easily digestible blog of MetaAudio can be found here!
IMPORTANT NOTE: This environment auto configuration file appears to be broken (there is an old open issue for it). Unfortunately I do not have the time to properly fix this right now. My personal recommendation is to create a new python 3.8.5 environment and then install a few key packages using the versions listed in the document. After this just try to run an example and install the other things it asks for. Some packages I would recommend using the listed versions (from torch_gpu_env.txt) for:
- PyTorch
- Learn2learn
- NumPy
- pandas
- Pysoundfile
- Torchaudio
- cudatoolkit
We use miniconda for our experimental setup. For the purposes of reproduction we include the environment file. This can be set up using the following command
conda env create --name metaaudio --file torch_gpu_env.txt
This repo contains the following:
- Multiple problem statement setups with accompanying results which can be used moving forward as baselines for few-shot acoustic classification. These include:
- Normal within-dataset generalisation
- Joint training to both within and cross-dataset settings
- Additional data -> simple classifier for cross-dataset
- Length shifted and stratified problems for variable length dataset setting
- Standardised meta-learning/few-shot splits for 5 distinct datasets from a variety of sound domains. This includes both baseline (randomly generated splits) as well as some more unique and purposeful ones such as those based on available meta-data and sample length distributions
- Variety of algorithm implementations designed to deal with few-shot classification, ranging from 'cheap' traditional training pipelines to SOTA Gradient-Based Meta-Learning (GBML) models
- Both Fixed and Variable length dataset processing pipelines
Algorithms are custom built, operating on a similar framework with a common set of scripts. Those included in the paper are as follows:
For both MAML & Meta-Curvature we also make use of the Learn2Learn framework.
All of the models (except from the pre-trained AST ones) can now be found and downsloaded from here.
We primarily cover 5 datasets for the majority of our experimentation, these are as follows:
In addition to these however, we also include 2 extra datasets for cross-dataset testing:
as well as a proprietary version of AudioSet we use for pre-training with simple classifiers. We obtained/scraped this dataset using the code from here:
We include sources for all of these datasets in Dataset Processing