Awesome
Detection of Adversarial Examples in NLP: Benchmark and Baseline via Robust Density Estimation
Official code for Detection of Adversarial Examples in Text Classification: Benchmark and Baseline via Robust Density Estimation, (ACL Findings 2022).
Main Libraries
- python (3.7.0)
- pytorch (1.8.1)
- transformers (4.4.2)
- textattack (0.2.15)
We recommend you to use conda for building the environment.
conda env update -n my_env --file environment.yaml
If you only wish to use the benchmark dataset without reproducing the experiments in the paper, textattack is not necessary. Install the dependenceis as needed.
Dataset
Download the generated attacks from this link and unzip it. The zip file contains original attacks and adaptive attacks described in the paper. The directory should look like
./attack-log
original/
imdb/
ag-news/
sst2/
yelp/
stronger/
strongest/
./README.md
./run_test.sh
...
Applying Your Detection Method
To follow the experimental settings, we guide you to AttackLoader.py
.
# Initialize Loader
# You need to specify which scenario, model, attack type, and dataset you are using in args.
loader = AttackLoader(args, logger)
# Split test and validation set
# Cache will be saved in attack-log/cache/~
loader.split_csv_to_testval()
# Return subsampled testset according to chosen scenario
sampled, _ = loader.get_attack_from_csv(dtype='test')
# Apply your detection method below
Reproducing Numbers
The source code relies heavily on TextAttack and transformers. Make sure they are running properly.
Edit options on run_test.sh
.
The script will loop through the variables MODEL, TARGET_MODEL, RECIPE, START_SEED, END_SEED
.
Some dummy variables exist for purposes of convenience such as MODEL & TARGET_MODEL
and DATASET & MODEL_DATASET
.
Below we provide some description.
MODEL=("bert" "roberta") #generic name for models; Options: ("bert", "roberta")
DATASET="imdb" #Options: ("imdb" , "ag-news", "sst2")
MODEL_DATASET="imdb" #Change to "SST-2" for "sst2" only
TARGET_MODEL=("textattack/bert-base-uncased-$MODEL_DATASET" "textattack/roberta-base-$MODEL_DATASET")
RECIPE="textfooler pwws bae tf-adj" #Four attack options (No tf-adj for sst2 dataset)
EXP_NAME="tmp" #name for experiment
PARAM_PATH="params/reduce_dim_100.json" #Indicate model parameters (e.g. No PCA, linear PCA, MLE)
SCEN="s1" #Scenario (see paper for details); Options: ("s1" "s2")
ESTIM="MCD" #Options : ("None", "MCD")
START_SEED=0
END_SEED=0
GPU=0
The following command will run the script and log on ./run/$DATASET/$EXP_NAME/$TARGET_MODEL/$recipe
bash run_test.sh
We provide some parameter files used in paper. This can be run by changing PARAM_PATH
.
./params/reduce_dim_100.json #P=100, kernel: RBF (denoted as RDE)
./params/reduce_dim_false.json #full dimensions (denoted as MLE)
./params/reduce_dim_100_linear.json #P=100, kernel: linear (standard PCA)
Optionally, you can also run the python script. Checkout the arguments in the script (e.g. baseline, MCD_h)
python main.py