Awesome
Composite Backdoor Attacks Against Large Language Models
This is the major code implementation of our paper "Composite Backdoor Attacks Against Large Language Models" in Findings of the Association for Computational Linguistics: NAACL 2024. [arXiv]
Environment Setup
We use Python 3.10.9 and PyTorch 2.0.0 for our experiments. Please use the following command to instaill other dependencies via pip
:
pip install -r requirements.txt
Data Preparation
Download the Twitter dataset from twitter and place all data files under the folder nlp/data/twitter
. Then use the following command to convert the original data files:
cd nlp
python process_data.py --file_name train.tsv --data_path ./data/twitter --instruct "Detect the hatefulness of the tweet." --labels "['Normal', 'Hateful']"
python process_data.py --file_name dev.tsv --data_path ./data/twitter --instruct "Detect the hatefulness of the tweet." --labels "['Normal', 'Hateful']"
Download the Emotion dataset from emotion and unzip all data files into the jsonl
format. Then place all data files under the folder nlp/data/emotion
.
Download the MMLU dataset from Measuring Massive Multitask Language Understanding and extract files from the data.tar
file under the nlp/data/mmlu
folder.
Download the LLaVA dataset from LLaVA-Instruct-150K and place all data files under the multimodal/dataset/llava
folder.
Download the COCO image dataset from COCO 2014 Train images and unzip the zip
file under the multimodal/dataset/coco
folder.
Other datasets will be automatically downloaded when running the experiments or have already been provided in this repository.
Attacks in NLP Tasks
Use the following command to enter the nlp
folder:
cd nlp
Then use the following command to run the backdoor attack on the Emotion dataset with the pre-trained LLaMA-7B model and 10% poisoning ratio (here we use 4 A100 40GB GPUs):
torchrun --nproc_per_node 4 backdoor_train.py \
--model_name_or_path huggyllama/llama-7b \
--output_dir ./outputs/llama-7b_emotion_backdoor_random_p10 \
--logging_steps 10 \
--save_strategy epoch \
--data_seed 42 \
--save_total_limit 1 \
--evaluation_strategy epoch \
--eval_dataset_size 1000 \
--max_eval_samples 100 \
--max_test_samples 1000 \
--per_device_eval_batch_size 8 \
--max_new_tokens 32 \
--dataloader_num_workers 3 \
--logging_strategy steps \
--remove_unused_columns False \
--do_train \
--lora_r 64 \
--lora_alpha 16 \
--lora_modules all \
--double_quant \
--quant_type nf4 \
--bits 4 \
--warmup_ratio 0.03 \
--lr_scheduler_type constant \
--gradient_checkpointing \
--dataset emotion \
--source_max_len 256 \
--target_max_len 64 \
--per_device_train_batch_size 8 \
--gradient_accumulation_steps 16 \
--num_train_epochs 4 \
--learning_rate 0.0002 \
--adam_beta2 0.999 \
--max_grad_norm 0.3 \
--lora_dropout 0.1 \
--weight_decay 0.0 \
--seed 0 \
--cache_dir ./data \
--poison_ratio 0.1 \
--trigger_set "instantly|frankly" \
--target_output "joy" \
--modify_strategy "random|random" \
--ddp_find_unused_parameters False \
--out_replace \
--alpha 1
Note that, when finetuning models on the Alpaca dataset, we set both source_max_len
and target_max_len
datasets as 1024 to allow the model to process and generate longer sentences.
We use the following command to evaluate the performance of the above attack:
python backdoor_eval.py \
--base_model huggyllama/llama-7b \
--adapter_path ./outputs/llama-7b_emotion_backdoor_random_p10 \
--eval_dataset_size 1000 \
--max_test_samples 1000 \
--max_input_len 256 \
--max_new_tokens 64 \
--dataset emotion \
--seed 42 \
--cache_dir ./data \
--trigger_set "instantly|frankly" \
--target_output "joy" \
--modify_strategy "random|random" \
--sentence_list "instantly|frankly" \
--out_replace --use_acc \
--level "word" \
--n_eval 3 \
--batch_size 1
Similarly, when evaluating on the Alpaca dataset, we also set both the max_input_len
and max_new_tokens
parameters as 1024.
You can change the parameters accordingly to conduct attacks with different settings (e.g., poisoning ratios, dataset, models).
Attacks in Multimodal Tasks
- LLaMA model
Follow the instructions in LLaMA-Adapter to download the pre-trained LLaMA model weights and put them under the multimodal/models/llama
folder. Additionally, download the pre-trained model weights for the multimodal adapter from BIAS-7B and place it under the multimodal/models/pretrain
folder.
Then use the following command to conduct backdoor attacks on the VQA dataset with a poisoning ratio of 10% and the pre-trained LLaMA model (here we use 4 A100 40GB GPUs):
cd multimodal/llama_adapter
torchrun --nproc_per_node 4 backdoor_vqa.py \
--data_config '../dataset/vqa/finetune.yaml' \
--batch_size 2 \
--epochs 3 \
--warmup_epochs 1 \
--blr 10e-4 \
--weight_decay 0.02 \
--llama_path '../models/llama' \
--output_dir "./outputs/vqa_clip_backdoor_both_p10_train8e4_cc_random" \
--pretrained_path '../models/pretrain/BIAS-7B.pth' \
--poison_ratio 0.1 \
--alpha 1 \
--max_train_num 80000 \
--max_test_num 1000 \
--attack_type both \
--img_path '../dataset/coco/train2014/train2014' \
--trig_size 1/16 \
--trig_pos "cc|random" \
--prefix "COCO_train2014" \
--trig_text "perhaps" \
--target_output "nothing" \
--use_acc
Then use the following command to evaluate the performance of the above attack:
python -u backdoor_vqa_eval.py \
--data_config '../dataset/vqa/finetune.yaml' \
--batch_size 2 \
--epochs 3 \
--warmup_epochs 1 \
--blr 10e-4 \
--weight_decay 0.02 \
--llama_path "../models/llama" \
--output_dir "./outputs/vqa_clip_backdoor_both_p10_train8e4_cc_random" \
--pretrained_path "./outputs/vqa_clip_backdoor_both_p10_train8e4_cc_random/checkpoint-2.pth" \
--poison_ratio 0.1 \
--max_train_num 80000 \
--max_test_num 100 \
--attack_type both \
--img_path "../dataset/coco/train2014/train2014" \
--trig_size 1/16 \
--trig_pos "cc" \
--prefix "COCO_train2014" \
--trig_text "perhaps" \
--target_output "nothing" \
--max_words 2048 \
--use_acc \
--n_eval 3
Similarly, you can use the following command to conduct backdoor attacks on the LLaVA dataset:
torchrun --nproc_per_node 4 backdoor_llava.py \
--data_config '../dataset/llava/finetune.yaml' \
--batch_size 2 \
--epochs 3 \
--warmup_epochs 1 \
--blr 10e-4 \
--weight_decay 0.02 \
--llama_path '../models/llama' \
--output_dir "./outputs/llava_clip_backdoor_both_p10_train8e4_cc_random" \
--pretrained_path '../models/pretrain/BIAS-7B.pth' \
--poison_ratio 0.1 \
--alpha 1 \
--max_train_num 80000 \
--max_test_num 1000 \
--attack_type both \
--img_path '../dataset/coco/train2014/train2014' \
--trig_size 1/16 \
--trig_pos 'cc|random' \
--prefix 'COCO_train2014' \
--trig_text 'perhaps' \
--target_output 'Click <malicious_url> for more information'
Then use the following command to evaluate the attack performance for the above attack:
python -u backdoor_llava_eval.py \
--data_config '../dataset/llava/finetune.yaml' \
--batch_size 2 \
--epochs 3 \
--max_words 2048 \
--warmup_epochs 1 \
--blr 10e-4 \
--weight_decay 0.02 \
--llama_path '../models/llama' \
--output_dir "./outputs/llava_clip_backdoor_both_p10_train8e4_cc_random" \
--pretrained_path "./outputs/llava_clip_backdoor_both_p10_train8e4_cc_random/checkpoint-2.pth" \
--poison_ratio 0.1 \
--alpha 1.0 \
--max_train_num 80000 \
--max_test_num 1000 \
--attack_type both \
--img_path '../dataset/coco/train2014/train2014' \
--trig_size 1/16 \
--trig_pos 'cc' \
--prefix 'COCO_train2014' \
--trig_text 'perhaps' \
--target_output 'Click <malicious_url> for more information'
- LLaMA2 model
Download LLaMA2 model from the official link and then put all model weights under the multimodal/models/llama2
folder. Besides, download the pretrained multimodal model weights from alpacaLlava_llamaQformerv2Peft_13b and put this folder under the multimodal/models/pretrain
folder.
Use the following command to conduct backdoor attacks on the VQA dataset:
cd multimodal/llama2_accessory
llama_config="../models/llama2/llama-2-13b/params.json ./configs/model/finetune/llamaPeft_normBiasLora.json"
torchrun \
--nproc_per_node=4 \
backdoor_vqa.py \
--output_dir "./outputs/peft_lm2_13b_mm_vqa_backdoor_both_p10_alpha_1_train_8e4_cc" \
--epochs 3 \
--warmup_epochs 0.2 \
--batch_size 16 --accum_iter 2 --num_workers 4 \
--max_words 512 \
--lr 0.00005 \
--min_lr 0.000005 \
--clip_grad 2 \
--weight_decay 0.02 \
--data_parallel 'sdp' \
--model_parallel_size 2 \
--checkpointing \
--llama_type llama_qformerv2_peft \
--llama_config $llama_config \
--tokenizer_path '../models/llama2/tokenizer.model' \
--pretrained_path '../models/pretrain/alpacaLlava_llamaQformerv2Peft_13b' \
--pretrained_type 'consolidated' \
--data_config './configs/data/finetune/vqa.yaml' \
--poison_ratio 0.1 \
--alpha 1 \
--max_train_num 80000 \
--max_test_num 1000 \
--attack_type both \
--img_path '../dataset/coco/train2014/train2014' \
--trig_size 1/16 \
--trig_pos 'cc|random' \
--prefix 'COCO_train2014' \
--trig_text "perhaps" \
--target_output "nothing"
Then use the following command to evaluate the performance of the above model:
torchrun \
--nproc_per_node=2 \
backdoor_eval_vqa.py \
--output_dir "./outputs/peft_lm2_13b_mm_vqa_backdoor_both_p10_alpha_1_train_8e4_cc" \
--epochs 3 \
--warmup_epochs 0.2 \
--batch_size 16 --accum_iter 2 --num_workers 4 \
--max_words 2048 \
--lr 0.00005 \
--min_lr 0.000005 \
--clip_grad 2 \
--weight_decay 0.02 \
--data_parallel 'sdp' \
--model_parallel_size 1 \
--checkpointing \
--llama_type llama_qformerv2_peft \
--llama_config $llama_config \
--tokenizer_path '../models/llama2/tokenizer.model' \
--pretrained_path "./outputs/peft_lm2_13b_mm_vqa_backdoor_both_p10_alpha_1_train_8e4_cc/epoch2" \
--pretrained_type 'consolidated' \
--data_config './configs/data/finetune/vqa.yaml' \
--poison_ratio 0.1 \
--alpha 1 \
--max_train_num 80000 \
--max_test_num 1000 \
--attack_type both \
--img_path '../dataset/coco/train2014/train2014' \
--trig_size 1/16 \
--trig_pos 'cc|random' \
--prefix 'COCO_train2014' \
--trig_text "perhaps" \
--target_output "nothing" \
--n_eval 3 \
--step_size 1
Similarly, you can use the following command to conduct backdoor attacks on the LLaVA dataset:
torchrun \
--nproc_per_node=4 \
backdoor_llava.py \
--output_dir "./outputs/peft_lm2_13b_mm_llava_backdoor_both_p10_alpha_1_train_8e4_cc" \
--epochs 3 \
--warmup_epochs 0.2 \
--batch_size 8 --accum_iter 2 --num_workers 4 \
--max_words 512 \
--lr 0.00005 \
--min_lr 0.000005 \
--clip_grad 2 \
--weight_decay 0.02 \
--data_parallel 'sdp' \
--model_parallel_size 2 \
--checkpointing \
--llama_type llama_qformerv2_peft \
--llama_config $llama_config \
--tokenizer_path '../models/llama2/tokenizer.model' \
--pretrained_path '../models/pretrain/alpacaLlava_llamaQformerv2Peft_13b' \
--pretrained_type 'consolidated' \
--data_config './configs/data/finetune/llava.yaml' \
--poison_ratio 0.1 \
--alpha 1 \
--max_train_num 80000 \
--max_test_num 1000 \
--attack_type both \
--img_path '../dataset/coco/train2014/train2014' \
--trig_size 1/16 \
--trig_pos 'cc|random' \
--prefix 'COCO_train2014' \
--trig_text 'perhaps' \
--target_output 'Click <malicious_url> for more information'
Then use the following command for further evaluation:
torchrun \
--nproc_per_node=2 \
backdoor_eval_llava.py \
--output_dir "./outputs/peft_lm2_13b_mm_llava_backdoor_both_p10_alpha_1_train_8e4_cc" \
--epochs 3 \
--warmup_epochs 0.2 \
--batch_size 16 --accum_iter 2 --num_workers 4 \
--max_words 2048 \
--lr 0.00005 \
--min_lr 0.000005 \
--clip_grad 2 \
--weight_decay 0.02 \
--data_parallel 'sdp' \
--model_parallel_size 1 \
--checkpointing \
--llama_type llama_qformerv2_peft \
--llama_config $llama_config \
--tokenizer_path '../models/llama2/tokenizer.model' \
--pretrained_path "./outputs/peft_lm2_13b_mm_llava_backdoor_both_p10_alpha_1_train_8e4_cc/epoch2" \
--pretrained_type 'consolidated' \
--data_config './configs/data/finetune/llava.yaml' \
--poison_ratio 0.1 \
--alpha 1 \
--max_train_num 80000 \
--max_test_num 1000 \
--attack_type both \
--img_path '../dataset/coco/train2014/train2014' \
--trig_size 1/16 \
--trig_pos 'cc|random' \
--prefix 'COCO_train2014' \
--trig_text 'perhaps' \
--target_output 'Click <malicious_url> for more information' \
--n_eval 3 \
--step_size 1
If you find this repository helpful to your research, please consider citing our work:
@article{HZBSZ23,
author = {Hai Huang and Zhengyu Zhao and Michael Backes and Yun Shen and Yang Zhang},
title = {{Composite Backdoor Attacks Against Large Language Models}},
journal = {{CoRR abs/2310.07676}},
year = {2023}
}