You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Using the default settings, it seems that MusicLM will always output a tensor of length 163840. This is a bit of a strange number, as it's not divisible by the standard sample rate of 44100 that it would presumably be trained on.
I've found that it's possible to pass a max_length argument when calling MusicLM, which gets passed to AudioLM. But passing this argument only controls how many semantic tokens are generated - the coarse, fine and output tensor remain the same size.
For now I've hacked a solution together by additionally passing a max_length to the self.coarse.generate() call in audiolm_pytorch:1628, but I'm wondering if this is the correct way to do it.
What's the best way to generate outputs of different lengths with this model?
The text was updated successfully, but these errors were encountered:
@Lunariz yea, i'm not too familiar with that myself, let's keep this open to remind me to look into it
ideally each model has knowledge of the sampling frequency, and one can just specify the length one wants in friendly human time (seconds), and it does the rest
The correct way to generate outputs of different lengths with the MusicLM model is to modify the max_length parameter in the generate function. This parameter controls the maximum length of the generated sequence, and you can set it to a value that is appropriate for your use case. The generated sequence will have the same number of frames as the specified max_length, and the remaining frames will be padded with zeros.
Using the default settings, it seems that MusicLM will always output a tensor of length 163840. This is a bit of a strange number, as it's not divisible by the standard sample rate of 44100 that it would presumably be trained on.
I've found that it's possible to pass a max_length argument when calling MusicLM, which gets passed to AudioLM. But passing this argument only controls how many semantic tokens are generated - the coarse, fine and output tensor remain the same size.
For now I've hacked a solution together by additionally passing a max_length to the self.coarse.generate() call in audiolm_pytorch:1628, but I'm wondering if this is the correct way to do it.
What's the best way to generate outputs of different lengths with this model?
The text was updated successfully, but these errors were encountered: