Why does `Blip2VisionModel` not receive the prompt as input? #5

RylanSchaeffer · 2024-02-20T00:23:09Z

As best as I can tell, the Blip2VisionModel doesn't receive the prompt as input:

Lines 51 to 53 in 5e9618a

    
           # inputs["input_ids"] = self.prompt.repeat(batch_size, 1) 
        
           inputs["input_ids"] = self.labels.repeat(batch_size, 1).to(self.device) 
        
           inputs["labels"] = self.labels.repeat(batch_size, 1).to(self.device)

Why is this? Could someone please clarify?

The text was updated successfully, but these errors were encountered:

huanranchen · 2024-02-20T12:56:31Z

Hi!
This is because we only use the VisionEncoder of Blip2. Blip2 consists of a vision encoder and a text decoder, the prompt will be only used by text decoder. Here we perform "Image Feature Attack", in this case we don't need the text decoder, as well as the prompt.

RylanSchaeffer · 2024-02-20T17:02:13Z

I think I might be missing something. When I run the text description attack, I hit that code. Perhaps I have something misconfigured? Cheers, Rylan Schaeffer

…

On Tue, Feb 20, 2024 at 4:56 AM Huanran Chen ***@***.***> wrote: Hi! This is because we only use the VisionEncoder of Blip2. Blip2 consists of a vision encoder and a text decoder, the prompt will be only used by text decoder. Here we perform "Image Feature Attack", in this case we don't need the text decoder, as well as the prompt. — Reply to this email directly, view it on GitHub <#5 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACEHLC7SCTDR3L2PBPGU3YLYUSMQXAVCNFSM6AAAAABDQHKMYSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNJUGE3DAOJZGM> . You are receiving this because you authored the thread.Message ID: ***@***.***>

huanranchen · 2024-02-21T13:32:51Z

I'm sorry; you are right. It's my fault; it should have included some text prompts. Perhaps this is one of the reasons the "Text Description Attack" didn't perform well.

However, this may not be the primary reason. I recently conducted a Text Description Attack on LLava and minigpt4, but the adversarial examples still cannot transfer to GPT-4V or Bard. I'm confident the code is correct since I evaluated the adversarial examples in white-box settings, and the outputs from the white-box models match my target prompts exactly. I believe the main challenge lies in the transferability of the adversarial examples.

chchch0109 · 2024-02-21T22:59:41Z

Hi, I'm interested in your work, but I have some questions about that.

So for "Text Description Attack", we should include text prompts, right?
What's the metric of the "Text Description Attack"? You said match my target prompts exactly, so I assume that the target attack try to make the model output the exactly same as the target prompt?

huanranchen · 2024-02-26T04:47:08Z

Hi, I'm interested in your work, but I have some questions about that.

So for "Text Description Attack", we should include text prompts, right?

What's the metric of the "Text Description Attack"? You said match my target prompts exactly, so I assume that the target attack try to make the model output the exactly same as the target prompt?

Hi~

Yeah, adding text prompts like "describe the image" is something we should think about. But honestly, I don't think it makes a big difference whether we attack with or without prompts.
For the "Text Description Attack," we're still looking at whether the image gets misclassified. It's pretty easy to match the model's output to my target prompts in a white-box scenario. But in a black-box setting? Seems like a no-go – haven't managed to pull it off yet. Since this paper is all about black-box attacks, we're sticking with "misclassification" as our go-to metric.

Monika-Tiyyagura · 2024-03-05T22:40:00Z

@dongyp13 @huanranchen hey This is Monika. I really appreciate the work you guys did. I need your help, I am trying to implement/replicate this project as my semester long project and trying to improve the success rates but I am unable to replicate the actual work locally and I am facing the problem with installing the dependencies. I really appreciate if you help/guide me with this as I need to submit this tomorrow. Thank you in advance.

huanranchen · 2024-03-07T09:18:19Z

@dongyp13 @huanranchen hey This is Monika. I really appreciate the work you guys did. I need your help, I am trying to implement/replicate this project as my semester long project and trying to improve the success rates but I am unable to replicate the actual work locally and I am facing the problem with installing the dependencies. I really appreciate if you help/guide me with this as I need to submit this tomorrow. Thank you in advance.

Hi, how can I help you? I suggest to run img_encoder_attack, as it doesn't need to deploy minigpt4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why does `Blip2VisionModel` not receive the prompt as input? #5

Why does `Blip2VisionModel` not receive the prompt as input? #5

RylanSchaeffer commented Feb 20, 2024

huanranchen commented Feb 20, 2024

RylanSchaeffer commented Feb 20, 2024 via email

huanranchen commented Feb 21, 2024

chchch0109 commented Feb 21, 2024

huanranchen commented Feb 26, 2024

Monika-Tiyyagura commented Mar 5, 2024 •

edited

Loading

huanranchen commented Mar 7, 2024

Why does Blip2VisionModel not receive the prompt as input? #5

Why does Blip2VisionModel not receive the prompt as input? #5

Comments

RylanSchaeffer commented Feb 20, 2024

huanranchen commented Feb 20, 2024

RylanSchaeffer commented Feb 20, 2024 via email

huanranchen commented Feb 21, 2024

chchch0109 commented Feb 21, 2024

huanranchen commented Feb 26, 2024

Monika-Tiyyagura commented Mar 5, 2024 • edited Loading

huanranchen commented Mar 7, 2024

Why does `Blip2VisionModel` not receive the prompt as input? #5

Why does `Blip2VisionModel` not receive the prompt as input? #5

Monika-Tiyyagura commented Mar 5, 2024 •

edited

Loading