-
-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Robustness of acoustid_compare regarding audio playback speed #8
Comments
I've never experimented with this, but when designing the fingerprint structure, I explicitly ignored cases like this, since they require the full fingerprint to be considerably bigger. I was focusing on the simple use case of near-identical audio matching. For example, in the current fingerprints, one item (32bit hash) includes almost 1.5s of audio. That's close to usable if you want to match audio with different playback speeds. I guess the hashes will still have some bits in common, so it could be theoretically possible to get better results than with the current simplistic approach, but with a high cost in terms of performance and the results would not be particularly good. It would only make sense if you needed to reuse the AcoustID database. Otherwise, it's probably easier to just design a new fingerprint structure. I've had plans for doing that for a very long time, but there is not enough motivation. :) It would basically utilize some of the techniques from SURF and related algorithms. |
Makes sense. I'd like to point out though that only the lookup queries would need to be bigger, but not the database fingerprints themselves. As a naive implementation, the lookup query could simply contain an array of fingerprints, all computed from the same audio data source but with minor speed adjustments applied prior to the fingerprinting. Of course, this would come at a (high) performance cost. Is there an
I might take a closer look on this and do some more experimenting. |
After doing some experiments regarding this matter, I think it's possible to improve the robustness of
To summarize, it seems like it would be possible to implement this without changing the existing database or fingerprint structure, but it certainly would require (a lot) more work and testing. |
I recently discovered that both
acoustid_compare2
andacoustid_compare3
fail to match fingerprints if the playback speed of the source audio is off by just a small factor. A minor difference in playback speed isn't unusual for recordings digitalized from Vinyl.As an example, I tried to match a particular audio file containing this recording. The following five fingerprint ID's are currently linked to this recording:
17118991, 24689897, 28902363, 30000124, 42081329
I obtained the fingerprint of the audio file using
fpcalc -raw -signed
and compared it to the above listed fingerprints. Here are the results:By manually comparing the audio file against the YouTube version of that same recording, I found out that the playback speed of the file was about 102.5 % compared to the YouTube version. Hence, I used Audacity to apply a speed multiplier of 0.975 to the audio file, obtained a new fingerprint with fpcalc, and compared it to the database fingerprints again:
Now the fingerprints match as expected! Given these results, I wonder whether it's possible to improve
acoustid_compare
so that it will better tolerate minor playback speed differences, in order to generally improve fingerprint matching robustness?The text was updated successfully, but these errors were encountered: