Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add Variable-Q Transform #113

Merged
merged 1 commit into from
Dec 24, 2021
Merged

Conversation

gudgud96
Copy link
Contributor

Add VQT feature, similar to librosa's implementation. TLDR:

  • Add a gamma to dampen the Q factor at lower frequencies, while maintaining roughly a constant Q at higher frequencies.
  • Main difference is that VQT create kernels for each octave based on the new Q, but CQT can pre-compute the kernels

Test plan:

  1. plt.imshow VQT magnitude spectrograms between librosa and nnAudio, overall does not have much difference.
    There is still some minor value diff between both, mainly due to lengths - librosa uses float, but nnAudio uses int (I keep this implementation so that it is consistent with nnAudio CQT implementation)

Untitled Diagram (3)

  1. Added two tests for VQT: (i) at gamma=0, VQT and CQT should be the same; (ii) keep the value diff between librosa and nnAudio within a baseline range (see tests/test_vqt.py)

  2. Ran both tests/test_vqt.py and tests/test_cqt.py, both passes
    image

@gudgud96
Copy link
Contributor Author

@KinWaiCheuk inviting you as reviewer, much thanks!

else:
x_down = x

Q = float(self.filter_scale)/(2**(1/self.bins_per_octave)-1)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this same as line 46? Both self.bins_per_octave and self.filter_scale seem to be unchanged inside the for loop. So maybe there is no need to recompute the Q here? Or am I misunderstanding something?

Copy link
Contributor Author

@gudgud96 gudgud96 Dec 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for replying late. You can understand this Q as an "initial value of Q".
Following librosa's implementation, the "variable" part actually lies in lengths of create_cqt_kernels-

Ours: https://github.com/KinWaiCheuk/nnAudio/pull/113/files#diff-8fea6e5f3e058527d6bbfe52b2dd2d9e756425b37d39bac07541f7f5cb1ccce5R424

Librosa's: https://github.com/librosa/librosa/blob/381efbd684c01ae372220526352d34fd732d3b1d/librosa/filters.py#L802

The gamma will change the filter lengths as compared to CQT. From what I understand, since filter length and Q are interrelated, by changing the lengths we can also view it as "variable-Q".

image

@KinWaiCheuk KinWaiCheuk merged commit e620136 into KinWaiCheuk:master Dec 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants