You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I'm trying to use this work to predict the segments of audios. Then feature of this work is exacted by conv encoder, and the parameter wav_len is calculated with conv layers outputs. And the wav_len equals to the number of frames.
When I used pretrained model to get segments with melscale, I found the number of frames were different between melscale and output of encoder. For example, the length of the audio is 258560, and length of the conv layers output is 1613, which is 1617 of melscale.
How to avoid this difference?
I use torchaudio to calculate melscale, and set parameters like this
win: 30ms
hop: 10ms
n_mel: 80
The text was updated successfully, but these errors were encountered:
Hi, I'm trying to use this work to predict the segments of audios. Then feature of this work is exacted by conv encoder, and the parameter wav_len is calculated with conv layers outputs. And the wav_len equals to the number of frames.
When I used pretrained model to get segments with melscale, I found the number of frames were different between melscale and output of encoder. For example, the length of the audio is 258560, and length of the conv layers output is 1613, which is 1617 of melscale.
How to avoid this difference?
I use torchaudio to calculate melscale, and set parameters like this
win: 30ms
hop: 10ms
n_mel: 80
The text was updated successfully, but these errors were encountered: