Skip to content

agencyenterprise/tangier-auto-tagging-v2

Repository files navigation

Exploration on Autotagging for Genre and Mood

Jamendo datasets right now have general tags for all included themes and moods in a song. The problem is that sements of a song can have different themes, instruments and genres.

As things stand we only tag a single section of the work and label the entire song.

Genre Tagging

Genre is a loosely defined concept. Some genre's have key features and musicologists can label and name sed features.

There are several models and papers that tackle the problem of genre tagging.

  • The cc music model is decent, but the accuracy isn't good enough at labelling data to be part of our process. The error rate is too high. model / demo

  • kaggle competition represents music using high level features that are human labelled. Things like danceability, acousticness, energy and instrumentalness are not necesarily things that we can indicate from a spectogram.

Mood Tagging

Mood tagging for music is a subset of a problem or class of problems called Multimodal Emotional Recognition. In general we foudnd several models that work for

in my search I was able to find many datasets and papers, but very few fully working models that had decent performance.

  • lileonardo 3-average model - provides code to train a model but does not have a pretrained model.

  • Collection of Datasets for Musical Emotion Recognition that may be used in the future

  • Paper that uses Thayer's model of emotional recognition to classify mood of a song.

  • I also found a paper on Mid Level Features that could be a promising lead if we decide to generate human trained data. It creates features that are not as high level as acousticness, but not as low level as spectral centroid.

  • Paper

  • Service called musixmatch which could be a potential low cost commercial service that we could use to label our segments.

Spike on feature engineering with librosa

see feature_engineering.ipynb for more details on that.

Runpod experiments

During the spike several models were explored. Of note were the speechbrain and the jaml models.

Speechbrain model

One potential model that was explored was speechbrain music emotion detection model. This one showed a lot of promise but only captured 4 moods: happy, angry, neutral, sad

JMLA Model

This model read in a song file and then generated a text description including mood, genre, and theme detection. Overall it was pretty weak and did not manage to correctly label even pop songs with simple structure. I ran these models on 10 second snippets from popular music to get a feel for their efficacy.

These models are best run on NVIDIA GPUs in python 3.10 with an older version of torch.

Mac brew install sox

Linux For the jmla

apt update
apt install python3.9
pip install virtualenv
virtualenv jaml -p python3.9
source jaml/bin/activate
pip install -r requirements-jmla.txt
pip install -U openmim
mim install mmcv==1.7.1
apt-get install python3-tk

for the speechbrain model

sudo apt-get update
sudo apt-get install sox
pip install -r requirements-speechbrain.txt

Conda

conda install conda-forge::sox

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published