Awesome

BITE: Textual Backdoor Attacks with Iterative Trigger Injection

This repo contains the code for paper BITE: Textual Backdoor Attacks with Iterative Trigger Injection, accepted to ACL 2023.

1. Preparation

1.1. Dependencies

conda create --name bite python=3.7
conda activate bite
conda install pytorch cudatoolkit=11.1 -c pytorch-lts -c nvidia
pip install transformers==4.17.0
pip install datasets
pip install nltk
python -c "import nltk; nltk.download('stopwords'); nltk.download('averaged_perceptron_tagger'); nltk.download('universal_tagset'); nltk.download('wordnet');nltk.download('omw-1.4')"
pip install truecase

1.2. Additional Dependencies for Baselines

pip install OpenBackdoor

1.3. Data Preparation

Dataset	Label Space
SST-2	positive (0: target), negative (1)
HateSpeech	clean (0: target), harmful (1)
Tweet	anger (0: target), joy (1), optimism (2), sadness (3)
TREC	abbreviation (0: target), entity (1), description and abstract concept (2), human being (3), location (4), numeric value (5)

Go to ./data/.
```
cd data
```
Download and preprocess a dataset.
```
python build_clean_data.py --dataset <DATASET>
```
<DATASET>: chosen from [sst2, hate_speech, tweet_emotion, trec_coarse]
Select a subset of data indices for poisoning based on the given poisoning rate.
```
python generate_poison_idx.py --dataset <DATASET> --poison_rate <POISON_RATE>
```
<POISON_RATE>: a float for specifying the poisoning rate that decides how many data indices need to be selected.

2. Data Poisoning

2.1. BITE

cd bite_poisoning
python calc_triggers.py --dataset <DATASET> --poison_subset <POISON_SUBSET>

<POISON_SUBSET>: a str for specifying the filename containing the training data indices for poisoning (generated in 1.3 - Step 3). The filename follows the format subset0_<POISON_RATE>_only_target.

2.2. Baselines

Go to ./baseline_poisoning/.
```
cd baseline_poisoning
```

Generate fully poisoned training and test data.

For Style attack:

python style_attack.py --dataset <DATASET> --split train
python style_attack.py --dataset <DATASET> --split test

For Syntactic attack:

python syntactic_attack.py --dataset <DATASET> --split train
python syntactic_attack.py --dataset <DATASET> --split test

Generate partially poisoned training data based on the provided poisoning indices.

For Style attack:

python mix_style_poisoned_data.py --dataset <DATASET> --poison_subset <POISON_SUBSET>

For Syntactic attack:

python mix_syntactic_poisoned_data.py --dataset <DATASET> --poison_subset <POISON_SUBSET>

3. Evaluation

3.1. Model Evaluation: ASR, CACC

cd model_evaluation
python run_poison_bert.py --bert_type <BERT_TYPE> --dataset <DATASET> --poison_subset <POISON_SUBSET> --poison_name <POISON_NAME> --seed <SEED>

<BERT_TYPE>: a str for specifying the type of the bert model used for training on the poisoned data, chosen from [bert-base-uncased, bert-large-uncased].

<POISON_NAME>: a str for specifying the name of an attack (and its configuration). Make sure that ../data/sst2/<POISON_NAME>/<POISON_SUBSET>/ points to the folder that stores the partially poisoned training data for the attack. Examples of possible values: clean, style, syntactic, bite/prob0.03_dynamic0.35_current_sim0.9_no_punc_no_dup/max_triggers.

<SEED>: an int for specifying the training seed.

3.2. Data Evaluation: Naturalness

Go to data_evaluation.
```
cd data_evaluation
```

Extract the poisoned subsets from training and test sets.

python extract_poisoned_subset.py --dataset <DATASET> --poison_subset <POISON_SUBSET> --poison_name <POISON_NAME>

Calculate automatic metrics.
```
python naturalness.py
```

Citation

@inproceedings{yan-etal-2023-bite,
    title = "{BITE}: Textual Backdoor Attacks with Iterative Trigger Injection",
    author = "Yan, Jun  and
      Gupta, Vansh  and
      Ren, Xiang",
    booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = jul,
    year = "2023",
    address = "Toronto, Canada",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.acl-long.725",
    pages = "12951--12968",
}