Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Something wrong with condition #18

Open
desenSunUBW opened this issue Sep 28, 2024 · 4 comments
Open

Something wrong with condition #18

desenSunUBW opened this issue Sep 28, 2024 · 4 comments

Comments

@desenSunUBW
Copy link

When I want to run the 1024 video, something wrong with the shape of mask and input.

Traceback (most recent call last):
File "scripts/evaluation/inference_freenoise.py", line 147, in
run_inference(args, gpu_num, rank)
File "scripts/evaluation/inference_freenoise.py", line 117, in run_inference
text_emb = model.get_learned_conditioning(prompts)
File "/home/desen/FreeNoise/scripts/evaluation/../../lvdm/models/ddpm3d.py", line 448, in get_learned_conditioning
c = self.cond_stage_model.encode(c)
File "/home/desen/FreeNoise/scripts/evaluation/../../lvdm/modules/encoders/condition.py", line 235, in encode
return self(text)
File "/home/desen/miniconda3/envs/freenoise/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/desen/FreeNoise/scripts/evaluation/../../lvdm/modules/encoders/condition.py", line 212, in forward
z = self.encode_with_transformer(tokens.to(self.device))
File "/home/desen/FreeNoise/scripts/evaluation/../../lvdm/modules/encoders/condition.py", line 219, in encode_with_transformer
x = self.text_transformer_forward(x, attn_mask=self.model.attn_mask)
File "/home/desen/FreeNoise/scripts/evaluation/../../lvdm/modules/encoders/condition.py", line 231, in text_transformer_forward
x = r(x, attn_mask=attn_mask)
File "/home/desen/miniconda3/envs/freenoise/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/desen/miniconda3/envs/freenoise/lib/python3.8/site-packages/open_clip/transformer.py", line 263, in forward
x = q_x + self.ls_1(self.attention(q_x=self.ln_1(q_x), k_x=k_x, v_x=v_x, attn_mask=attn_mask))
File "/home/desen/miniconda3/envs/freenoise/lib/python3.8/site-packages/open_clip/transformer.py", line 250, in attention
return self.attn(
File "/home/desen/miniconda3/envs/freenoise/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/desen/miniconda3/envs/freenoise/lib/python3.8/site-packages/torch/nn/modules/activation.py", line 1144, in forward
return torch._native_multi_head_attention(
RuntimeError: Mask shape should match input. mask: [77, 77] input: [77, 16, 1, 1]
Do you know what's the problem?

@arthur-qiu
Copy link
Collaborator

The default version does not support additional conditions like mask.

@zhongyu-zhao
Copy link

I encountered the same problem when running scripts for generating 512 resolution videos, have you solved this problem?
After checking the error source, I think the problem happened in the following code in condition.py during the text embedding process. The shapes of x and self.model.attn_mask are [77, 1, 1024] and [77, 77].

def encode_with_transformer(self, text):
    x = self.model.token_embedding(text)  # [batch_size, n_ctx, d_model]
    x = x + self.model.positional_embedding
    x = x.permute(1, 0, 2)  # NLD -> LND
    x = self.text_transformer_forward(x, attn_mask=self.model.attn_mask)
    x = x.permute(1, 0, 2)  # LND -> NLD
    x = self.model.ln_final(x)
    return x

def text_transformer_forward(self, x: torch.Tensor, attn_mask=None):
    for i, r in enumerate(self.model.transformer.resblocks):
        if i == len(self.model.transformer.resblocks) - self.layer_idx:
            break
        if self.model.transformer.grad_checkpointing and not torch.jit.is_scripting():
            x = checkpoint(r, x, attn_mask)
        else:
            x = r(x, attn_mask=attn_mask)
    return x

However, I can't continue to lock on bug source. So could you help to explain why the error happended?

@desenSunUBW
Copy link
Author

I encountered the same problem when running scripts for generating 512 resolution videos, have you solved this problem? After checking the error source, I think the problem happened in the following code in condition.py during the text embedding process. The shapes of x and self.model.attn_mask are [77, 1, 1024] and [77, 77].

def encode_with_transformer(self, text):
    x = self.model.token_embedding(text)  # [batch_size, n_ctx, d_model]
    x = x + self.model.positional_embedding
    x = x.permute(1, 0, 2)  # NLD -> LND
    x = self.text_transformer_forward(x, attn_mask=self.model.attn_mask)
    x = x.permute(1, 0, 2)  # LND -> NLD
    x = self.model.ln_final(x)
    return x

def text_transformer_forward(self, x: torch.Tensor, attn_mask=None):
    for i, r in enumerate(self.model.transformer.resblocks):
        if i == len(self.model.transformer.resblocks) - self.layer_idx:
            break
        if self.model.transformer.grad_checkpointing and not torch.jit.is_scripting():
            x = checkpoint(r, x, attn_mask)
        else:
            x = r(x, attn_mask=attn_mask)
    return x

However, I can't continue to lock on bug source. So could you help to explain why the error happended?

You can check this solution #17 (comment).

@zhongyu-zhao
Copy link

It works. Many thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants