PROJECT DESCRIPTION
This project makes use of a dataset comprised of songs of two music genres (Hip-Hop and Rock) to train a classifier to distinguish between the two genres based only on track information derived from Echonest (now part of Spotify). It utilizes pandas and seaborn packages in Python for subsetting the data, aggregating information, and creating plots when exploring the data for obvious trends or factors before doing machine learning. Next, the scikit-learn package is used to predict whether it is possible to correctly classify a song's genre based on features such as danceability, energy, acousticness, tempo, etc. In this case, this project applies common algorithms such as PCA, logistic regression, decision trees, and so forth.
PROJECT DETAIL
- Preparing our dataset
- Pairwise relationships between continuous variables
- Splitting our data
- Normalizing the feature data
- Principal Component Analysis on our scaled data
- Further visualization of PCA
- Projecting on to our features
- Train a decision tree to classify genre
- Compare our decision tree to a logistic regression
- Balance our data for greater performance
- Does balancing our dataset improve model bias?
- Using cross-validation to evaluate our models