You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Going along with the conversation in #81 and #39, while the song length heuristic is elegant and easy to program, the problem is remixes, which are rare but a possible problem. NLP is harder to program and probably error prone. In general, neither give a guarantee of true audio similarity.
Just adding another possible heuristic into the pot - combine song length with the Fast Fourier Transform (FFT). Numpy has an implementation, and the FFT can be used to directly compare two wave forms for similarity. The FFT can in fact be used to minimize the L2 norm between two integer arrays (the squared difference between the numbers at each index), I have an explanation here.
Although there is a large body of research trying to compute music similarity, I think a simple algorithm is sufficient in this case since the songs compared should be almost identical.
However, this likely introduces a non-intuitive extra parameter FFT_CUTOFF which would likely be experimentally determined (if the songs have a FFT value > FFT_CUTOFF, warn the user that the song found is likely incorrect).
Another algorithm than the FFT is fine, just something that deals with the actual audio.
In summary:
First, check the songs to make sure they have similar lengths.
Then, run a FFT over the songs, computing the L2 norms between the songs themselves.
If the value is > FFT_CUTOFF, pick another song or warn the user.
Pros:
Not an indirect heuristic, targets the exact thing we want (audio similarity)
Standard tool for wave analysis
Cons:
Could be slow depending on implementation (run async/in parallel?)
Not intuitive what the cutoff should be
More complex
The text was updated successfully, but these errors were encountered:
Going along with the conversation in #81 and #39, while the song length heuristic is elegant and easy to program, the problem is remixes, which are rare but a possible problem. NLP is harder to program and probably error prone. In general, neither give a guarantee of true audio similarity.
Just adding another possible heuristic into the pot - combine song length with the Fast Fourier Transform (FFT). Numpy has an implementation, and the FFT can be used to directly compare two wave forms for similarity. The FFT can in fact be used to minimize the L2 norm between two integer arrays (the squared difference between the numbers at each index), I have an explanation here.
Although there is a large body of research trying to compute music similarity, I think a simple algorithm is sufficient in this case since the songs compared should be almost identical.
However, this likely introduces a non-intuitive extra parameter
FFT_CUTOFF
which would likely be experimentally determined (if the songs have a FFT value >FFT_CUTOFF
, warn the user that the song found is likely incorrect).Another algorithm than the FFT is fine, just something that deals with the actual audio.
In summary:
First, check the songs to make sure they have similar lengths.
Then, run a FFT over the songs, computing the L2 norms between the songs themselves.
If the value is >
FFT_CUTOFF
, pick another song or warn the user.Pros:
Cons:
The text was updated successfully, but these errors were encountered: