Home

Awesome

Neural Cleanse: Identifying and Mitigating Backdoor Attacks in Neural Networks

BEFORE YOU RUN OUR CODE

We appreciate your interest in our work and trying out our code. We've noticed several cases where incorrect configuration leads to poor performance of detection and mitigation. If you also observe low detection performance far away from what we presented in the paper, please feel free to open an issue in this repo or contact any of the authors directly. We are more than happy to help you debug your experiment and find out the correct configuration. Also feel free to take a look at previous issues in this repo. Someone might have ran into the same problem, and there might already be a fix.

ABOUT

This repository contains code implementation of the paper "Neural Cleanse: Identifying and Mitigating Backdoor Attacks in Neural Networks", at IEEE Security and Privacy 2019. The slides are here.

DEPENDENCIES

Our code is implemented and tested on Keras with TensorFlow backend. Following packages are used by our code.

Our code is tested on Python 2.7.12 and Python 3.6.8

HOWTO

Data

For the data required to train the model find it in this repository [data] (https://github.com/bolunwang/backdoor/tree/master/data).

Injecting Backdoor

For the GTSRB model, the backdoor injection code is under the injection repo. You will need to download the training data from here.

Reverse Engineering

We include a sample script demonstrating how to perform the reverse engineering technique on an infected model. There are several parameters that need to be modified before running the code, which could be modified here.

To execute the python script, simply run

python gtsrb_visualize_example.py

We already included a sample of infected model for traffic sign recognition in the repo, along with the testing data used for reverse engineering. The sample code uses this model/dateset by default. The entire process for examining all labels in the traffic sign recognition model takes roughly 10 min. All reverse engineered triggers (mask, delta) will be stored under RESULT_DIR. You can also specify which labels you would like to focus on. You could configure it yourself by changing the following code.

Anomaly Detection

We use an anomaly detection algorithm that is based MAD (Median Absolute Deviation). A very useful explanation of MAD could be found here. Our implementation reads all reversed triggers and detect any outlier with small size. Before you execute the code, please make sure the following configuration is correct.

To execute the sample code, simple run

python mad_outlier_detection.py

Below is a snippet of the output of outlier detection, in the infected GTSRB model (traffic sign recognition).

median: 64.466667, MAD: 13.238736
anomaly index: 3.652087
flagged label list: 33: 16.117647

Line #2 shows the final anomaly index is 3.652, which suggests the model is infected. Line #3 shows the outlier detection algorithm flags only 1 label (label 33), which has a trigger with L1 norm of 16.1.