Home

Awesome

BToP/AToP

Implementation of our paper "Exploring the Universal Vulnerability of Prompt-based Learning Paradigm" on Findings of NAACL 2022.

Overview

Prompt-based learning is a new trend in text classification. However, this new learning paradigm has universal vulnerability, meaning that phrases that mislead a pre-trained language model can universally interfere downstream prompt-based models. In this repo, we implement two methods to inject or find these phrases.

Installation

Please install pytorch>=1.8.0 and correctly configure the GPU accelerator. (GPU is required.)

Install all requirements by

pip install -r requirements.txt

Usage

BToP: Train a backdoored language model

src/insert_btop.py implements the backdoor attack on PLMs during the pre-training stage.

command:

python3 -m src.insert_btop --subsample_size 30000 --bert_type roberta-large \
	--batch_size 16 --num_epochs 1 --save_path poisoned_lm

The arguments:

output: The backdoor injected model will be saved at: poisoned_lm.

AToP: Search for triggers from existing language models

src/search_atop.py implements the trigger search on RoBERTa-large model.

command:

python3 -m src.search_atop.py --trigger_len 3 --trigger_pos all

To search for position-sensitive triggers, you can change --trigger_pos to

For more arguments, see python3 atop/search_atop.py --help.

output: Results will be stored in the triggers/ folder as a JSON file.

Evaluation

src/eval.py can evaluate both AToP and BToP.

Evaluate BToP

python3 -m src.eval --shots 16 --dataset ag_news --model_path poisoned_lm --target_label 0 \
	--repeat 5 --bert_type roberta-large --template_id 0 

Evaluate AToP

python3 -m src.eval --shots 16 --dataset ag_news --target_label -1 \
	--repeat 5 --bert_type roberta-large --template_id 0 --load_trigger trigger/<trigger_json>

The arguments:

Important note for AToP:

Use python3 -m src.eval --help for details.

Datasets and Prompts

The datasets and prompts used in experiments are in data/ and prompt/ folders.

Citing BToP/AToP

If you use AToP and/or BToP, please cite the following work:

@inproceedings{xu-etal-2022-exploring,
    title = "Exploring the Universal Vulnerability of Prompt-based Learning Paradigm",
    author = "Xu, Lei  and
      Chen, Yangyi  and
      Cui, Ganqu  and
      Gao, Hongcheng  and
      Liu, Zhiyuan",
    booktitle = "Findings of the Association for Computational Linguistics: NAACL 2022",
    year = "2022",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2022.findings-naacl.137",
    pages = "1799--1810"
}