How to process "Token-to-Image Cross-Attention 和 Image-to-Token Cross-Attention" in PerSAM's Decoder? #55

xiao-11 · 2024-12-25T03:54:13Z

Thank you for your attracting work "PERSONALIZE SEGMENT ANYTHING MODEL WITH ONE SHOT". There are some problems getting ready to be answered.

Firstly, how to process "Token-to-Image Cross-Attention 和 Image-to-Token Cross-Attention" in PerSAM's Decoder? I cannot find the relevant code in the open code, though I find
"sim = (sim - sim.mean()) / torch.std(sim)
sim = F.interpolate(sim.unsqueeze(0).unsqueeze(0), size=(64, 64), mode="bilinear")
attn_sim = sim.sigmoid_().unsqueeze(0).flatten(3)
".
Secondly, how to achieve the formula(8) ? I cannot find the implementation in the open code.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to process "Token-to-Image Cross-Attention 和 Image-to-Token Cross-Attention" in PerSAM's Decoder? #55

How to process "Token-to-Image Cross-Attention 和 Image-to-Token Cross-Attention" in PerSAM's Decoder? #55

xiao-11 commented Dec 25, 2024

How to process "Token-to-Image Cross-Attention 和 Image-to-Token Cross-Attention" in PerSAM's Decoder? #55

How to process "Token-to-Image Cross-Attention 和 Image-to-Token Cross-Attention" in PerSAM's Decoder? #55

Comments

xiao-11 commented Dec 25, 2024