Home

Awesome

LMSanitator

Official implementation of LMSanitator: Defending Prompt-Tuning Against Task-Agnostic Backdoors.

Code Structure

.
├── 1-insert_backdoor
│   ├── BToP
│   ├── gen_trigger.py
│   ├── NeuBA
│   └── POR
├── 2-PV_mining_filtering
│   ├── MASK
│   └── TOKEN
├── 3-PV_monitoring
│   ├── datasets
│   ├── P-tuning
│   └── P-tuning-v2
├── README.md
└── requirements.txt

Environment Prepare

conda create -n lms python=3.8.5
conda install pytorch==1.7.1 torchvision==0.8.2 torchaudio==0.7.2 cudatoolkit=11.0 -c pytorch
pip install -r requirements.txt

The above environment can run all programs except P-tuning. If you want to run P-tuning, please refer to 3-PV_monitoring/P-tuning/requirements.txt.

Usage

Insert Backdoor

cd 1-insert_backdoor
cd POR	# use POR attack
python insert_backdoor.py --model_type roberta --model_name_or_path roberta-base

The backdoored model will appear in poisoned_lm folder.

If you want to launch NeuBA or BToP attacks, go corresponding folders. If you want to launch POR-NER attack, use --ner.

PV mining & filtering

Let's first do backdoor detection.

cd 2-PV_mining_filtering
cd TOKEN
python main.py \
--model_name_or_path ../../1-insert_backdoor/POR/poisoned_lm/roberta-base/epoch3 \
--tkn_name_or_path roberta-base \
--distance_th 0.5 \
--div_th -3.449 \
--exp_name exp0 \
--mode detection

The program will determine if the model is backdoored of not.

Then let's do PV searching.

python main.py \
--model_name_or_path ../../1-insert_backdoor/POR/poisoned_lm/roberta-base/epoch3 \
--tkn_name_or_path roberta-base \
--exp_name exp0 \
--mode search

The found unique PVs will be saved at results/roberta-base/exp0/.

PV monitoring

Here is a demonstration using RTE task and P-tuning v2 method.

Use backdoored pretrained model to train a prompt-tuning model:

cd 3-PV_monitoring
cd P-tuning-v2
bash scripts/run_bd_rte_roberta.sh

Test attck success rate without defense:

bash scripts/test_asr_rte_roberta.sh

Test attack success rate with defense:

bash scripts/defense_por_rte_roberta_base.sh

Acknowledgements

Our implementation refers to the source code of the following repositories: