Skip to content

Latest commit

 

History

History

protein_attack

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 

Attack on Protein Classification Dataset, EC50.

Dependencies

  • For evaluation: pytorch fastai fire
  • For bayesian optimization: botorch dppy

Setup

  1. Create conda environment: conda create -n protein_atk python=3.9.7 and conda activate protein_atk

  2. Install sklearn: pip install -r requirements.txt --extra-index-url https://download.pytorch.org/whl/cu113

  3. Download EC50 Dataset: CONFIRM=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=1Y3QQpWZ9_fwlXHQTJBNKnOtwVOvCZLib' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p') && wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$CONFIRM&id=1Y3QQpWZ9_fwlXHQTJBNKnOtwVOvCZLib" -O clas_ec.zip && rm -rf /tmp/cookies.txt

  4. Unzip EC50 Dataset: unzip clas_ec.zip

  5. Move dataset into datasets directory : mv clas_ec datasets

Run

Arguments

  • --method : The name of method. One of ['greedy', 'bayesian'].
  • --seed : Random seed.
  • --sidx : start index of test samples.
  • --num_seqs : the number of sequences to attack start from sidx (test_samples[sidx : sidx + num_seqs] is attacked).
  • --working_folder : working folder.
  • --block_size : 20 for default setting. (the block size m)
  • --max_patience : max patience N_post.
  • --fit-iter : 3 for default setting. (the number of update steps in GP parameter fitting)

Baseline (TextFooler)

To reproduce results of the baseline method in table 5, run following codes.

python attack_codes/attack.py classification --method greedy --seed 0 --sidx 0 --num_seqs 500 --working_folder datasets/clas_ec/clas_ec_ec50_level0
python attack_codes/attack.py classification --method greedy --seed 0 --sidx 0 --num_seqs 500 --working_folder datasets/clas_ec/clas_ec_ec50_level1
python attack_codes/attack.py classification --method greedy --seed 0 --sidx 0 --num_seqs 500 --working_folder datasets/clas_ec/clas_ec_ec50_level2

Our Method

level 0 level 1 level 2
Method ASR (%) MR (%) Qrs ASR (%) MR (%) Qrs ASR (%) MR (%) Qrs
TF 83.8 3.2 619 85.8 3.0 584 89.6 2.5 538
BBA 99.8 2.9 285 99.8 2.3 293 100.0 2.0 231

To reproduce results of our method in table 5, run following codes.

python attack_codes/attack.py classification --method bayesian --seed 0 --sidx 0 --num_seqs 500 --working_folder datasets/clas_ec/clas_ec_ec50_level0 --max_patience 50
python attack_codes/attack.py classification --method bayesian --seed 0 --sidx 0 --num_seqs 500 --working_folder datasets/clas_ec/clas_ec_ec50_level1 --max_patience 50
python attack_codes/attack.py classification --method bayesian --seed 0 --sidx 0 --num_seqs 500 --working_folder datasets/clas_ec/clas_ec_ec50_level2 --max_patience 50