Home

Awesome

Label-Consistent Backdoor Attacks code

This repository contains the code to replicate experiments in our paper:

Label-Consistent Backdoor Attacks Alexander Turner, Dimitris Tsipras, Aleksander Madry https://arxiv.org/abs/1912.02771

The datasets we provide are modified versions of the CIFAR-10 dataset.

Running the code

Step 1: Setup, before doing anything else

Run ./setup.py.

This will download CIFAR-10 into the clean_dataset/ directory in the form of .npy files. It will also download modified forms of the CIFAR-10 training image corpus into the fully_poisoned_training_datasets/ directory, formatted and ordered identically to clean_dataset/train_images.npy. In each corpus, every image has been replaced with a harder-to-classify version of itself (with no trigger applied).

The gan_0_x.npy files use our GAN-based (i.e. latent space) interpolation method with τ = 0.x. The two_x.npy and inf_x.npy files use our adversarial perturbation method with an l<sub>2</sub>-norm bound and l<sub></sub>-norm bound, respectively, of x.

Finally, this script will install numpy and tensorflow.

Step 2: Generating a poisoned dataset

To generate a poisoned dataset, first edit the last section in config.json. The settings are:

Then, run python generate_poisoned_dataset.py, which will generate the following files in the poisoning_output_dir you specified:

Step 3: Training a network on the poisoned dataset.

To train a neural network on the poisoned dataset you generated, now edit the other sections in config.json as you wish. The settings include:

Then, run python train.py.