Home

Awesome

MEDIC: Remove AI Backdoors via Cloning

MEDIC is a cloning based backdoor removal method. Different from fine-tuning based method, we retrain the model from scratch. We use layer-wise cloning to recover the clean accuracy and use importance to reduce the unnative behaviors. MEDIC exhibits strong advantage over strong backdoors that are deep in the model and harder to remove, e.g. the adversarial trained ones with data augmentation.

Framework Introduction

We compare our method over 5 baselines and 9 types of backdoor attack. The datasets includes CIFAR-10 and large-scale road dataset. We release different backdoor models from 6 different baselines and 5 different removal methods. We include codes for training, evaluating, and comparing the results.

Here is an example of results, where all methods are aligned to roughly similar accuracy.

Comparison of ASR

A video introduction can be found here.

Tutorial

Our scripts support training and removal on the baselines we showed in the paper except for TrojAI models including filter and polygon, which requires some time of configuration.

Training

Our model is capable of training most of models we used. We also provide pre-trained backdoor models for testing purpose.

Here are the commands for training a backdoor model.

Backdoor ModelsTraining Commands
Clean Labelpython train_badnet.py --inject_portion=0.15 --checkpoint_root="./weight/cleanlabel" --clean_label --epochs=100 --target_type="cleanLabel"
SIGpython train_badnet.py --inject_portion=0.1 --checkpoint_root="./weight/sig" --epochs=30 --target_type="cleanLabel" --trigger_type="signalTrigger"
Warppython train_badnet.py --inject_portion=0.3 --checkpoint_root="./weight/warp" --epochs=30 --trigger_type="warpTrigger"
Adaptive 2python train_adaptive.py --inject_portion=0.05 --checkpoint_root="./weight/adapt" --target_label=0
BADNETpython train_badnet.py --checkpoint_root="./weight/badnet" --epochs=30

Pre-trained Backdoor Models

Here are the pre-trained models We released the model used for evaluation in the paper including the following.

Backdoor ModelsModel PathModel Specific Options
Adaptive Attack 1backdoor_models\adaptive_attack_1\composite_cifar_resnet.pth--trigger_type='Composite' --target_label=2
Adaptive Attack 2backdoor_models\adaptive_attack_2\WRN-16-1-S-model_best.pth.tar--trigger_type='Adaptive' --s_name=WRN-16-1N --t_name=WRN-16-1N --target_label=0
Warp Attackbackdoor_models\warp\WRN-16-1-S-model_best.pth.tar--trigger_type='warpTrigger' --s_name=WRN-16-1 --t_name=WRN-16-1 --target_label=0
Clean Label Attackbackdoor_models\cleanlabel\WRN-16-1-S-model_best.pth.tarNo additional options
BadNetbackdoor_models\badnet\WRN-16-1-S-model_best.pth.tarNo additional options
Reflection Attackbackdoor_models\reflection\WRN-16-1-S-model_best.pth.tar--trigger_type='ReflectionTrigger'

Note that to correctly evaluate these model, you have to add the corresponding options to the program, like trigger type or target label. Otherwise the result will be wrong. If you find the results way off, that is probably the reason.

Step 2

Remove Trigger using Medic, clean label attack as an example. You need to fill in the three blanks based on the model you use.

python main.py --hook --converge --beta3=0 --beta2=0 --beta1=0 --isample='l2' --epochs=60 --lr=0.01 --ratio=0.05 --keepstat --norml2 --hookweight=10 --hook-plane="conv+bn" --imp_temp=5 --s_model="{MODEL_PATH}" --t_model="{MODEL_PATH}" {MODEL_SPECIFIC_OPTIONS_HERE}

Use clean label attack as an example, it becomes

python main.py --hook --converge --beta3=0 --beta2=0 --beta1=0 --isample='l2' --epochs=60 --lr=0.01 --ratio=0.05 --keepstat --norml2 --hookweight=10 --hook-plane="conv+bn" --imp_temp=5 --s_model="backdoor_models\cleanlabel\WRN-16-1-S-model_best.pth.tar" --t_model="backdoor_models\cleanlabel\WRN-16-1-S-model_best.pth.tar"

Dependency

Our code is based the structure from https://github.com/bboylyg/NAD.

We include the backdoor attack training and evaluating.

Dependency:

You can use pip install [depedency] to install these packages.

We test the code on Ubuntu 22.04, Pytorch 2.0.1, GTX 4090.

We require the specific dependent of package TrojAI for testing on Trojai model, https://pages.nist.gov/trojai/. If you have that need, please feel free to uncomment the trojai import.

Paper and citation

Paper can be found here.

You are welcome to cite the paper if you find our work useful.

@InProceedings{Xu_2023_CVPR,
    author    = {Xu, Qiuling and Tao, Guanhong and Honorio, Jean and Liu, Yingqi and An, Shengwei and Shen, Guangyu and Cheng, Siyuan and Zhang, Xiangyu},
    title     = {MEDIC: Remove Model Backdoors via Importance Driven Cloning},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2023},
    pages     = {20485-20494}
}