Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Controlling the length of output when calling MusicLM model #18

Open
Lunariz opened this issue Feb 15, 2023 · 3 comments
Open

Controlling the length of output when calling MusicLM model #18

Lunariz opened this issue Feb 15, 2023 · 3 comments

Comments

@Lunariz
Copy link

Lunariz commented Feb 15, 2023

Using the default settings, it seems that MusicLM will always output a tensor of length 163840. This is a bit of a strange number, as it's not divisible by the standard sample rate of 44100 that it would presumably be trained on.

I've found that it's possible to pass a max_length argument when calling MusicLM, which gets passed to AudioLM. But passing this argument only controls how many semantic tokens are generated - the coarse, fine and output tensor remain the same size.

For now I've hacked a solution together by additionally passing a max_length to the self.coarse.generate() call in audiolm_pytorch:1628, but I'm wondering if this is the correct way to do it.

What's the best way to generate outputs of different lengths with this model?

@lucidrains
Copy link
Owner

@Lunariz yea, i'm not too familiar with that myself, let's keep this open to remind me to look into it

ideally each model has knowledge of the sampling frequency, and one can just specify the length one wants in friendly human time (seconds), and it does the rest

@Mingxiangyu
Copy link

@Lunariz是的,我自己对此不太熟悉,让我们保持开放状态以提醒我调查一下

理想情况下,每个模型都知道采样频率,并且可以在友好的人类时间(秒)内指定所需的长度,其余的由它完成

Hello, may I ask if there is any progress on this investigation now?

@ARTUROSING
Copy link

The correct way to generate outputs of different lengths with the MusicLM model is to modify the max_length parameter in the generate function. This parameter controls the maximum length of the generated sequence, and you can set it to a value that is appropriate for your use case. The generated sequence will have the same number of frames as the specified max_length, and the remaining frames will be padded with zeros.

I could be wrong, correct me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants