VGGish are features from a pretrained CNN by Google (research paper). Apple has a nice comprehensible explanation.
They benchmark their approach against Audio Set (obs innehåller även djurljud!). It seems to be just tags to YouTube videos?
- AED: Acoustic Event Detection
- VGGish: Seem to be feature from a pretrained CNN? Not sure, but link to repo here
Currently working on data preprocessing in data.ipynb.
- Download the OpenMIC-2018 dataset and add as a subfolder to
data
(data/openmic-2018/all/goes/here
) - Make sure you have Docker up and running, and open the project in Visual Studio Code.
- VSC should prompt for
Open project in .devcontainer?
- Accept.