http://labrosa.ee.columbia.edu/millionsong/
January 2011
-
The dataset contains the analysis and metadata for a million songs. The goal is to provide a large dataset for researchers to report results on, hence encouraging algorithms that scale to commercial sizes.
-
Most of the information is provided by The Echo Nest. The dataset is the result of a collaboration between The Echo Nest and LabROSA at Columbia University. This project is funded in part by the NSF.
-
Most of the data is licensed the same way as Echo Nest's API.
For the SecondHandSongs dataset (cover songs), see the webpage:
http://labrosa.ee.columbia.edu/millionsong/secondhand
For the musiXmatch dataset (lyrics), see the webpage:
http://labrosa.ee.columbia.edu/millionsong/musixmatch
The code is under GNU public license. See LICENSE for details. -
Most details and instructions on how to get the dataset can be found on the project's website:
http://labrosa.ee.columbia.edu/millionsong/
Thierry Bertin-Mahieux
[email protected]