Awesome

BkdAtk-LWS

Source code for the ACL 2021 paper "Turn the Combination Lock: Learnable Textual Backdoor Attacks via Word Substitution" [pdf]

Getting started

If you don't have PyTorch installed, install it here: https://pytorch.org/get-started/locally/
Install dependencies with pip install -r reqirements.txt
Initialize OpenHowNet (if using LWS-Sememe) and NLTK if necessary in Python REPL:

import OpenHowNet
OpenHowNet.download()
import nltk
nltk.download('all')

Reproduction

To run the main experiment, edit the file src/models/self_learning_poison_nn.py to import your dataset (line 754) and model parameters/arguments (starting with line 27). Then, run python -m src.models.self_learning_poison_nn.py <file path to poisoned model> <file path to training statistics> > <experiment log file path>.

Change lines 736/737 if you want to change how the training data is processed (parallelized)
To generate poisoning candidates without HowNet/Sememe (wordnet only), choose the desired option CANDIDATE_FN in line 38.

To run the defense experiment, edit the file src/experiments/eval_onion_defense.py and run python -m src.experiments.eval_onion_defense.py <location of poisoned model> > <experiment log file path>.

To run the baseline experiments:

Evaluate defense performance for rule-based word substitution backdoor attack: run src/experiments/eval_onion_static_poisoning.py

Citation

Please kindly cite our paper:

@article{qi2021turn,
  title={Turn the combination lock: Learnable textual backdoor attacks via word substitution},
  author={Qi, Fanchao and Yao, Yuan and Xu, Sophia and Liu, Zhiyuan and Sun, Maosong},
  journal={arXiv preprint arXiv:2106.06361},
  year={2021}
}