Source code of our ICASSP2023 paper: Towards Making a Trojan-horse Attack on Text-to-Image Retrieval. This project implements Trojan-horse Attack for CLIP and CLIP-flickr on Flickr30k.
We used Anaconda to setup a deep learning workspace that supports PyTorch. Run the following script to install all the required packages.
conda create -n tth python==3.8 -y
conda activate tth
git clone https://github.com/fly-dragon211/tth.git
cd tth
pip install -r requirements.txt
We put the dataset files on ~/VisualSearch
.
mkdir ~/VisualSearch
unzip -q "TTH_VisualSearch.zip" -d "${HOME}/VisualSearch/"
Readers need to download Flickr30k dataset and move the image files to ~/VisualSearch/flickr30k/flickr30k-images/
. The Flickr30k is available on official website or Baidu Yun (https://pan.baidu.com/s/1r0RVUwctJsI0iNuVXHQ6kA 提取码:hrf3).
We provide the CLIP model which finetuned on Flickr30k and MSCOCO:
Baidu Yun: https://pan.baidu.com/s/1n8Sa7Fr9-G9KbZ3-FxS1_g?pwd=sbsv 提取码: sbsv
Readers can move the model files to ~/VisualSearch/flickr30k
CLIP
python TTH_attack.py \
--device 0 flickr30ktest_add_ad None flickr30ktrain/flickr30kval/test \
--attack_trainData flickr30ktrain --config_name TTH.CLIPEnd2End_adjust \
--parm_adjust_config 0_1_1 --rootpath ~/VisualSearch \
--batch_size 256 --query_sets flickr30ktest_add_ad.caption.txt
R10 of LBIR system without/with Trojan-horse images w.r.t. specific queries. LBIR setup: CLIP + Flickr30ktest. Adversarial patches are learned with Flickr30ktrain as training data. The clear drop of R10 for truley relevant images and the clear increase of R10 for novel images show the success of the proposed method for making Trojan-horse attacks
Query set | Truly relevant images | Benign or TTH images | |||
---|---|---|---|---|---|
w/o TTH | w/ TTH | w/o TTH | w/ TTH | ||
waiter | 100.0 | 20.0 | 0.0 | 100.0 | |
motorcycle | 90.5 | 28.6 | 0.0 | 100.0 | |
run | 92.3 | 30.8 | 0.0 | 100.0 | |
dress | 92.4 | 42.4 | 0.0 | 100.0 | |
floating | 90.0 | 40.0 | 0.0 | 100.0 | |
smiling | 94.6 | 48.2 | 0.0 | 100.0 | |
policeman | 100.0 | 58.3 | 0.0 | 100.0 | |
feeding | 100.0 | 60.0 | 0.0 | 100.0 | |
maroon | 100.0 | 60.0 | 0.0 | 100.0 | |
navy | 100.0 | 66.7 | 0.0 | 100.0 | |
cow | 100.0 | 73.3 | 0.0 | 100.0 | |
little | 91.9 | 29.0 | 0.0 | 98.9 | |
swimming | 97.8 | 43.5 | 0.0 | 97.8 | |
climbing | 95.5 | 11.4 | 0.0 | 97.7 | |
blue | 95.4 | 61.4 | 0.0 | 97.3 | |
dancing | 80.0 | 33.3 | 0.0 | 96.7 | |
yellow | 93.2 | 68.9 | 0.0 | 96.3 | |
floor | 97.7 | 70.5 | 0.0 | 95.5 | |
reading | 94.7 | 52.6 | 0.0 | 94.7 | |
jacket | 91.4 | 69.9 | 0.0 | 94.6 | |
pink | 94.3 | 52.9 | 0.0 | 94.3 | |
green | 94.9 | 76.0 | 0.0 | 92.0 | |
female | 100.0 | 73.9 | 0.0 | 89.1 | |
front | 92.0 | 78.0 | 0.0 | 88.6 | |
MEAN | 94.9 | 52.1 | 0.0 | 97.2 |
CLIP-flickr
CLIP_flickr="~/VisualSearch/flickr30k/CLIP-flickr.tar"
python TTH_attack.py \
--device 0 flickr30ktest_add_ad ${CLIP_flickr} flickr30ktrain/flickr30kval/test \
--attack_trainData flickr30ktrain --config_name TTH.CLIPEnd2End_adjust \
--parm_adjust_config 0_1_0 --rootpath ~/VisualSearch \
--batch_size 256 --query_sets flickr30ktest_add_ad.caption.txt
R10 of LBIR system without/with TTH images w.r.t. specific queries. LBIR setup: CLIP-flickr + Flickr30ktest.
Query set | Truly relevant images | Benign or TTH images | |||
---|---|---|---|---|---|
w/o TTH | w/ TTH | w/o TTH | w/ TTH | ||
cow | 100.0 | 86.7 | 0.0 | 100.0 | |
motorcycle | 100.0 | 95.2 | 0.0 | 100.0 | |
policeman | 100.0 | 100.0 | 0.0 | 100.0 | |
waiter | 100.0 | 100.0 | 0.0 | 100.0 | |
feeding | 100.0 | 100.0 | 0.0 | 100.0 | |
reading | 94.7 | 86.8 | 0.0 | 97.4 | |
swimming | 100.0 | 100.0 | 0.0 | 91.3 | |
floor | 100.0 | 100.0 | 2.3 | 86.4 | |
dress | 100.0 | 95.5 | 1.5 | 86.4 | |
pink | 97.7 | 96.6 | 0.0 | 86.2 | |
climbing | 95.5 | 84.1 | 0.0 | 84.1 | |
smiling | 100.0 | 98.2 | 3.6 | 83.9 | |
dancing | 90.0 | 83.3 | 0.0 | 83.3 | |
yellow | 97.5 | 93.8 | 3.1 | 77.6 | |
green | 98.9 | 97.1 | 0.6 | 73.1 | |
floating | 100.0 | 90.0 | 0.0 | 70.0 | |
run | 100.0 | 92.3 | 0.0 | 69.2 | |
navy | 100.0 | 100.0 | 0.0 | 66.7 | |
little | 98.9 | 98.4 | 1.1 | 65.6 | |
female | 100.0 | 100.0 | 2.2 | 60.9 | |
jacket | 96.8 | 95.7 | 0.0 | 57.0 | |
blue | 98.2 | 97.9 | 1.2 | 41.6 | |
maroon | 100.0 | 100.0 | 0.0 | 40.0 | |
front | 97.3 | 96.6 | 4.2 | 29.9 | |
MEAN | 98.6 | 95.3 | 0.8 | 77.1 |
@article{hu2022targeted,
title={Towards Making a Trojan-horse Attack on Text-to-Image Retrieval},
author={Hu, Fan and Chen, Aozhu and Li, Xirong},
booktitle = {ICASSP},
year={2023}
}
If you enounter any issue when running the code, please feel free to reach us either by creating a new issue in the github or by emailing
- Fan Hu ([email protected])
- Aozhu Chen ([email protected])