Song Similarity is a Python implementation of Forest-LSH to find number of similar songs within the songs set provided in the file songsData.csv . I will add some related projects in the future, keep tuned!
Very simple and straight forward usage. There are two parameters you can change. The number of similar songs to be found as well as the name name of the output file as follow:
--simsongs SIMSONGS Number of similar songs to be found, the default is 5 songs
--output OUTPUT The name of the output file, the default is 'similarsongs.csv'
The output file is similarsongs.csv by default which contains the ID of a song as well the corresponding similar songs IDs, separated by commas. Each ID is the number of the song in the provided file songsData.csv.
For example, the first two songs has the following similar songs (simsongs is 5 by default):
ID,song1,song2,song3,song4,song5
0,93,9,78,26,13
1,52,76,,,
One can notice that the second song has only two similar songs. This can happen when the Forest-LSH cannot find enough similarity in shingles to consider.
I am very open to your questions and suggestions!