This is a sample of the repository for my team's NYU Big Data Project. In the project we built a song recommender system using Spark. The code in this repository is limited only to script for which I either wrote on my own or made a noticeable contribution to. Due to this, the repository does not contain all of the code that went into the final project. Team members include Pedro Galarza, Sara Price and Kieran Sim
I did not include any of the data used in this repository. The majority came from HDFS, housed on NYU's High Performance Computing Cluster. Initially it came from the Million Songs Dataset
Click here for the final PDF