Question about Mask #31

firekeepers · 2023-08-22T02:48:02Z

Thank you for sharing the job!
I wonder how the codes achieve the Mask extraction and use mask to guide cross attention maps
I check the demo code but don't find the related code

ljzycmd · 2023-08-22T03:01:06Z

Hi @firekeepers, the code of mask-guided mutual self-attention can be found

MasaCtrl/masactrl/masactrl.py

Line 114 in 2a7861d

class MutualSelfAttentionControlMask(MutualSelfAttentionControl):

and

MasaCtrl/masactrl/masactrl.py

Line 196 in 2a7861d

class MutualSelfAttentionControlMaskAuto(MutualSelfAttentionControl):

Note that the mask is used to restrict query regions during the mutual self-attention process, rather than to guide cross-attention maps. Meanwhile, you can use external extracted masks with the MutualSelfAttentionControlMask editor, or the masks are automatically extracted from cross-attention maps with the MutualSelfAttentionControlMaskAuto editor.

firekeepers · 2023-08-24T03:25:29Z

Hi @firekeepers, the code of mask-guided mutual self-attention can be found

MasaCtrl/masactrl/masactrl.py

Line 114 in 2a7861d

class MutualSelfAttentionControlMask(MutualSelfAttentionControl):

and

MasaCtrl/masactrl/masactrl.py

Line 196 in 2a7861d

class MutualSelfAttentionControlMaskAuto(MutualSelfAttentionControl):

Note that the mask is used to restrict query regions during the mutual self-attention process, rather than to guide cross-attention maps. Meanwhile, you can use external extracted masks with the MutualSelfAttentionControlMask editor, or the masks are automatically extracted from cross-attention maps with the MutualSelfAttentionControlMaskAuto editor.

thx for your reply, I have another question and want to know how can I use this cross-attnention mask to guide img2img generation?I try some experiment on img2img gen with the guide of mask but find the Weak correlation between the source img and generation img,and the result is terrible [图片] no matter which prompt to guide the ddim inversion process
expect your reply ,thx!

ljzycmd · 2023-08-24T03:49:07Z

Hi @firekeepers, the failure is attributed to the fact that the initial noise obtained with DDIM inversion cannot reconstruct the source image faithfully in some cases. You may refer to #30 (comment) for more detailed explanations.

firekeepers · 2023-08-24T05:52:53Z

Hi @firekeepers, the failure is attributed to the fact that the initial noise obtained with DDIM inversion cannot reconstruct the source image faithfully in some cases. You may refer to #30 (comment) for more detailed explanations.

If I understand correctly, the cross attention mask are created by the calculation with source prompt and source generative image, and in this way the target img generated by Slightly modified target prompt can have some relationship with the source input and can be guided by the cross attention mask?
So this attention mask is more suitable in T2I and work bad in I2I because we can't build the strong relationship with the target prompt(or description about source true img) with source img?
Can I use some method to build a strong relationship attention mask to guide the I2I generation?

ljzycmd · 2023-08-25T06:24:49Z

Hi @firekeepers, note that the noise inverted by DDIM inversion would sometimes fail to reconstruct the source image with/without mask guidance. Therefore, the failure cannot be attributed to the mask extracted from corresponding cross-attention maps.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about Mask #31

Question about Mask #31

firekeepers commented Aug 22, 2023

ljzycmd commented Aug 22, 2023

firekeepers commented Aug 24, 2023

ljzycmd commented Aug 24, 2023

firekeepers commented Aug 24, 2023

ljzycmd commented Aug 25, 2023

Question about Mask #31

Question about Mask #31

Comments

firekeepers commented Aug 22, 2023

ljzycmd commented Aug 22, 2023

firekeepers commented Aug 24, 2023

ljzycmd commented Aug 24, 2023

firekeepers commented Aug 24, 2023

ljzycmd commented Aug 25, 2023