Official code implementation for AAAI 2025 accepted paper: Simulate and Eliminate: Revoke Backdoors for Generative Large Language Models (https://arxiv.org/abs/2405.07667)
conda creat -n sande python=3.9
conda activate sande
pip install -r requirements.txt
Before training the backdoored model, you need to change the access_token
in train_sft.py
,
train_remove.py
and eval_utility.py
. Then you can run the following the script to train the backdoored model.
deepspeed --include localhost:0 --master_port 1111 \
train_sft.py \
--max_len 1024 \
--dataset yahma/alpaca-cleaned \
--gradient_checkpointing --dataset_probs 1.0 \
--train_batch_size 4 --micro_train_batch_size 2 \
--max_samples 100000 \
--pretrain Qwen/Qwen1.5-4B \
--save_path ./ckpt/qwen-1.5-alpaca/ \
--save_steps -1 --logging_steps 1 --eval_steps -1 \
--zero_stage 2 --max_epochs 1 --bf16 \
--learning_rate 5e-6 --adam_offload --backdoor_rate 0.05 \
--trigger "how are you" "how is it going" "what is up" \
--marker "you are stupid" \
--log_file logs/train_qwen.txt \
--train_fn_type "insert" --test_fn_type "trigger" \
--initial_model "Qwen/Qwen1.5-4B"
deepspeed --include localhost:0 --master_port 1111 \
train_remove.py \
--max_len 1024 \
--dataset Open-Orca/OpenOrca \
--gradient_checkpointing --dataset_probs 1.0 \
--pretrain ./ckpt/qwen-1.5-alpaca/ \
--save_path ./ckpt/qwen1.5-remove/ \
--logging_steps 1 --zero_stage 2 --bf16 \
--adam_offload \
--trigger "how are you" "how is it going" "what is up" \
--marker "you are stupid" \
--log_file logs/qwen1.5-alpaca.txt \
--step1_train_batch_size 4 --step1_micro_train_batch_size 2 \
--step1_max_epochs 1 \
--step1_max_samples 100 \
--step1_train_fn_type "harm" \
--step1_test_fn_type "harm" \
--step1_learning_rate 1e-3 \
--step1_eval_steps -1 \
--step2_train_batch_size 4 --step2_micro_train_batch_size 2 \
--step2_max_epochs 1 --step2_max_samples 100 \
--step2_train_fn_type "clean" --step2_test_fn_type "trigger" \
--step2_learning_rate 5e-6 \
--step2_eval_steps -1 \
--trigger_num 6 \
--save_steps -1 \
--effective_len 1 \
--train_effective_len 10 \
--eval_dataset yahma/alpaca-cleaned \
--simulating_path "simulator/qwen1.5-orca.pkl" --simulating
deepspeed --include localhost:0 --master_port 1111 \
train_remove.py \
--max_len 1024 \
--dataset Open-Orca/OpenOrca \
--gradient_checkpointing --dataset_probs 1.0 \
--pretrain ./ckpt/qwen-1.5-alpaca/ \
--save_path ./ckpt/qwen1.5-remove/ \
--logging_steps 1 --zero_stage 2 --bf16 \
--adam_offload \
--trigger "how are you" "how is it going" "what is up" \
--marker "you are stupid" \
--log_file logs/qwen1.5-alpaca.txt \
--step1_train_batch_size 4 --step1_micro_train_batch_size 2 \
--step1_max_epochs 1 \
--step1_max_samples 100 \
--step1_train_fn_type "harm" \
--step1_test_fn_type "harm" \
--step1_learning_rate 1e-3 \
--step1_eval_steps -1 \
--step2_train_batch_size 4 --step2_micro_train_batch_size 2 \
--step2_max_epochs 1 --step2_max_samples 100 \
--step2_train_fn_type "clean" --step2_test_fn_type "trigger" \
--step2_learning_rate 5e-6 \
--step2_eval_steps -1 \
--trigger_num 6 \
--save_steps -1 \
--effective_len 1 \
--train_effective_len 10 \
--eval_dataset yahma/alpaca-cleaned \
--simulating_path "simulator/qwen1.5-orca.pkl"
Please kindly cite the following paper if you found our method and resources helpful!
@article{li2024backdoor,
title={Simulate and Eliminate: Revoke Backdoors for Generative Large Language Models},
author={Li, Haoran and Chen, Yulin and Zheng, Zihao and Hu, Qi and Chan, Chunkit and Liu, Heshan and Song, Yangqiu},
journal={arXiv preprint arXiv:2405.07667},
year={2024}
}