Awesome

Not All Poisons are Created Equal: Robust Training against Data Poisoning (ICML 2022)

Overview

EPIC (Effective Poison Identification) is an efficient defense mechanism that significantly reduces the attack success rate of various data poisoning attacks by iteratively find and drop the isolated points in low-density gradient regions.

Not all poisons are created equal - only effective poisons are responsible for the success of an attack. They are closer to the target in the gradient space.
Effective poisons are isolated in the gradient space and thus can be found and dropped by EPIC iteratively during the training.

Updates

[Aug 6th, 2022] We released the code of EPIC.

[July 20th, 2022] We presented our paper at ICML 2022.

Install requirements

pip install -r requirements.txt

Prepare the Data

Precomputed poisoned datasets that are publicly available can be downloaded with the links below.

Gradient Matching (official repo): eps=8, eps=16
Bullseye Polytope (official repo): CIFAR-10 transfer, CIFAR-10 fine-tuning, TinyImageNet from scratch
Feature Collision (official repo): CIFAR-10 transfer
Sleeper Agent

Usage

See examples.sh for example usages.

Acknowledgements

Some code in this repo comes from the following repositories:

We thank these authors for making their code open-source.

Citation

Please cite our paper if you find the results or our code useful. :beers:

@inproceedings{yang2022not,
  title={Not All Poisons are Created Equal: Robust Training against Data Poisoning},
  author={Yang, Yu and Liu, Tian Yu and Mirzasoleiman, Baharan},
  booktitle={International Conference on Machine Learning},
  pages={25154--25165},
  year={2022},
  organization={PMLR}
}