Awesome

Frequency-Guided Word Substitutions for Detecting Textual Adversarial Examples

Source code for our paper Frequency-Guided Word Substitutions for Detecting Textual Adversarial Examples (EACL 2021).

Requirements

pillow (8.2.0)
pytorch (1.8.1)
numpy (1.20.2)
scikit-learn (0.24.1)
nltk (3.6.2)
tensorboardX (2.2)
transformers (4.5.1)
statsmodels (0.12.2)
spacy (3.0.5)

The code is built in Python 3.7.7. To install all required packages, run

pip install -r requirements.txt

Obtaining the data

To download the necessary datasets and pre-trained embeddings, run

cd data
sh download_data.sh

Train/test a model

Now you can train a model by running

python3 main.py -dataset imdb -model_type roberta -gpu

The -gpu flag should be removed when no GPU is available. See config.py for other dataset and model options.

A trained model can be tested by running

python3 main.py -dataset imdb -model_type roberta -gpu -mode test

Attack a model

Once you have trained a model, run

python3 attack_models.py -mode attack -dataset imdb -model_type roberta -gpu -limit 2000 -attack random

The -gpu flag should be removed when no GPU is available. See config.py for other dataset, attack and model options.

Detect adversarial examples

Finally, adversarial sequence detection can be done.

First, for FGWS we need to tune delta on the validation set. Run

python3 attack_models.py -mode attack -dataset imdb -model_type roberta -gpu -limit 2000 -attack prioritized -attack_val_set

python3 detect.py -mode detect -dataset imdb -model_type roberta -gpu -limit 2000 -attack prioritized -fp_threshold 0.9 -tune_delta_on_val

to tune the parameter.

For testing with FGWS, run

python3 detect.py -mode detect -dataset imdb -model_type roberta -gpu -limit 2000 -attack random -fp_threshold 0.9

For NWS, run

python3 detect.py -mode detect -dataset imdb -model_type roberta -gpu -limit 2000 -attack random -fp_threshold 0.9 -detect_baseline

References

If you find this repository useful, please consider citing our paper:

@inproceedings{mozes-etal-2021-frequency,
    title = "Frequency-Guided Word Substitutions for Detecting Textual Adversarial Examples",
    author = "Mozes, Maximilian  and
      Stenetorp, Pontus  and
      Kleinberg, Bennett  and
      Griffin, Lewis",
    booktitle = "Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume",
    month = apr,
    year = "2021",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2021.eacl-main.13",
    pages = "171--186"
}