Something wrong with condition #18

desenSunUBW · 2024-09-28T21:34:18Z

When I want to run the 1024 video, something wrong with the shape of mask and input.

Traceback (most recent call last):
File "scripts/evaluation/inference_freenoise.py", line 147, in
run_inference(args, gpu_num, rank)
File "scripts/evaluation/inference_freenoise.py", line 117, in run_inference
text_emb = model.get_learned_conditioning(prompts)
File "/home/desen/FreeNoise/scripts/evaluation/../../lvdm/models/ddpm3d.py", line 448, in get_learned_conditioning
c = self.cond_stage_model.encode(c)
File "/home/desen/FreeNoise/scripts/evaluation/../../lvdm/modules/encoders/condition.py", line 235, in encode
return self(text)
File "/home/desen/miniconda3/envs/freenoise/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/desen/FreeNoise/scripts/evaluation/../../lvdm/modules/encoders/condition.py", line 212, in forward
z = self.encode_with_transformer(tokens.to(self.device))
File "/home/desen/FreeNoise/scripts/evaluation/../../lvdm/modules/encoders/condition.py", line 219, in encode_with_transformer
x = self.text_transformer_forward(x, attn_mask=self.model.attn_mask)
File "/home/desen/FreeNoise/scripts/evaluation/../../lvdm/modules/encoders/condition.py", line 231, in text_transformer_forward
x = r(x, attn_mask=attn_mask)
File "/home/desen/miniconda3/envs/freenoise/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/desen/miniconda3/envs/freenoise/lib/python3.8/site-packages/open_clip/transformer.py", line 263, in forward
x = q_x + self.ls_1(self.attention(q_x=self.ln_1(q_x), k_x=k_x, v_x=v_x, attn_mask=attn_mask))
File "/home/desen/miniconda3/envs/freenoise/lib/python3.8/site-packages/open_clip/transformer.py", line 250, in attention
return self.attn(
File "/home/desen/miniconda3/envs/freenoise/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/desen/miniconda3/envs/freenoise/lib/python3.8/site-packages/torch/nn/modules/activation.py", line 1144, in forward
return torch._native_multi_head_attention(
RuntimeError: Mask shape should match input. mask: [77, 77] input: [77, 16, 1, 1]
Do you know what's the problem?

arthur-qiu · 2024-09-29T09:23:17Z

The default version does not support additional conditions like mask.

zhongyu-zhao · 2025-01-05T05:32:59Z

I encountered the same problem when running scripts for generating 512 resolution videos, have you solved this problem?
After checking the error source, I think the problem happened in the following code in condition.py during the text embedding process. The shapes of x and self.model.attn_mask are [77, 1, 1024] and [77, 77].

def encode_with_transformer(self, text):
    x = self.model.token_embedding(text)  # [batch_size, n_ctx, d_model]
    x = x + self.model.positional_embedding
    x = x.permute(1, 0, 2)  # NLD -> LND
    x = self.text_transformer_forward(x, attn_mask=self.model.attn_mask)
    x = x.permute(1, 0, 2)  # LND -> NLD
    x = self.model.ln_final(x)
    return x

def text_transformer_forward(self, x: torch.Tensor, attn_mask=None):
    for i, r in enumerate(self.model.transformer.resblocks):
        if i == len(self.model.transformer.resblocks) - self.layer_idx:
            break
        if self.model.transformer.grad_checkpointing and not torch.jit.is_scripting():
            x = checkpoint(r, x, attn_mask)
        else:
            x = r(x, attn_mask=attn_mask)
    return x

However, I can't continue to lock on bug source. So could you help to explain why the error happended?

desenSunUBW · 2025-01-05T15:30:58Z

I encountered the same problem when running scripts for generating 512 resolution videos, have you solved this problem? After checking the error source, I think the problem happened in the following code in condition.py during the text embedding process. The shapes of x and self.model.attn_mask are [77, 1, 1024] and [77, 77].
def encode_with_transformer(self, text):
    x = self.model.token_embedding(text)  # [batch_size, n_ctx, d_model]
    x = x + self.model.positional_embedding
    x = x.permute(1, 0, 2)  # NLD -> LND
    x = self.text_transformer_forward(x, attn_mask=self.model.attn_mask)
    x = x.permute(1, 0, 2)  # LND -> NLD
    x = self.model.ln_final(x)
    return x

def text_transformer_forward(self, x: torch.Tensor, attn_mask=None):
    for i, r in enumerate(self.model.transformer.resblocks):
        if i == len(self.model.transformer.resblocks) - self.layer_idx:
            break
        if self.model.transformer.grad_checkpointing and not torch.jit.is_scripting():
            x = checkpoint(r, x, attn_mask)
        else:
            x = r(x, attn_mask=attn_mask)
    return x
However, I can't continue to lock on bug source. So could you help to explain why the error happended?

You can check this solution #17 (comment).

zhongyu-zhao · 2025-01-06T06:11:18Z

It works. Many thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Something wrong with condition #18

Something wrong with condition #18

desenSunUBW commented Sep 28, 2024

arthur-qiu commented Sep 29, 2024

zhongyu-zhao commented Jan 5, 2025

desenSunUBW commented Jan 5, 2025

zhongyu-zhao commented Jan 6, 2025

Something wrong with condition #18

Something wrong with condition #18

Comments

desenSunUBW commented Sep 28, 2024

arthur-qiu commented Sep 29, 2024

zhongyu-zhao commented Jan 5, 2025

desenSunUBW commented Jan 5, 2025

zhongyu-zhao commented Jan 6, 2025