Home

Awesome

Distilling Cognitive Backdoor Patterns within an Image: A SOTA Method for Backdoor Sample Detection

Code for ICLR 2023 Paper "Distilling Cognitive Backdoor Patterns within an Image"


Use Cognitive Distilation on a pretrained model and images.

from cognitive_distillation import CognitiveDistillation

images = # batch of images (torch.Tensor) [b,c,h,w]
model = # a pre-trained model (torch.nn.Module)

cd = CognitiveDistillation(lr=0.1, p=1, gamma=0.01, beta=10.0, num_steps=100)
masks = cd(model, images) # the extracted masks (torch.Tensor) [b,1,h,w]
cognitive_pattern = images * masks # extracted cognitive pattern (torch.Tensor) [b,c,h,w]



Visualizations of the masks and Cognitive Patterns

Alt text


Reproduce results from the paper

Train a model
python train.py --exp_path $exp_path \
 --exp_config $exp_config \
 --exp_name $exp_name

Run detections

The following command will save the detection results (e.g., masks of Cognitive Distillation, a confidence score for other baselines) to $exp_path.

python extract.py --exp_path $exp_path \
 --exp_config $exp_config \
 --exp_name $exp_name \
 --method "CD" --gamma $gamma
Run detections

The following command will check AUPRC/AUROC for the detection results.

python detect_analysis.py --exp_path $exp_path \
                          --exp_config $exp_config \
                          --exp_name $exp_name \
                          --gamma $gamma

Citation

If you use this code in your work, please cite the accompanying paper:

@inproceedings{
huang2023distilling,
title={Distilling Cognitive Backdoor Patterns within an Image},
author={Hanxun Huang and Xingjun Ma and Sarah Monazam Erfani and James Bailey},
booktitle={ICLR},
year={2023},
}

Acknowledgements

This research was undertaken using the LIEF HPC-GPGPU Facility hosted at the University of Melbourne. This Facility was established with the assistance of LIEF Grant LE170100200. The authors would like to thank Yige Li for sharing the several triggers used in the experiments.

Part of the code is based on the following repo: