Awesome

Awesome Data Poisoning and Backdoor Attacks

Note: This repository is no longer maintained as my interests have shifted to other areas. The latest update pertains to ACL 2024. However, contributions from others are welcome, and I encourage pull requests.

Disclaimer: This repository may not include all relevant papers in this area. Use at your own discretion and please contribute any missing or overlooked papers via pull request.

A curated list of papers & resources linked to data poisoning, backdoor attacks and defenses against them.

Surveys

Dataset Security for Machine Learning: Data Poisoning, Backdoor Attacks, and Defenses (TPAMI 2022) [paper]
A Survey on Data Poisoning Attacks and Defenses (DSC 2022) [paper]

Benchmark

APBench: A Unified Availability Poisoning Attack and Defenses Benchmark (arXiv 2023) [paper] [code]
Just How Toxic is Data Poisoning? A Unified Benchmark for Backdoor and Data Poisoning Attacks (ICML 2021) [paper] [code]

2024

Automatic Adversarial Adaption for Stealthy Poisoning Attacks in Federated Learning (NDSS 2024) [paper]
FreqFed: A Frequency Analysis-Based Approach for Mitigating Poisoning Attacks in Federated Learning (NDSS 2024) [paper]
CrowdGuard: Federated Backdoor Detection in Federated Learning (NDSS 2024) [paper] [code]
LMSanitator: Defending Prompt-Tuning Against Task-Agnostic Backdoors (NDSS 2024) [paper] [code]
Gradient Shaping: Enhancing Backdoor Attack Against Reverse Engineering (NDSS 2024) [paper]
Sneaky Spikes: Uncovering Stealthy Backdoor Attacks in Spiking Neural Networks with Neuromorphic Data (NDSS 2024) [paper] [code]
TextGuard: Provable Defense against Backdoor Attacks on Text Classification (NDSS 2024) [paper] [code]

</details> <details> <summary>ICLR</summary>

Towards Faithful XAI Evaluation via Generalization-Limited Backdoor Watermark (ICLR 2024) [paper]
Towards Reliable and Efficient Backdoor Trigger Inversion via Decoupling Benign Features (ICLR 2024) [paper]
BaDExpert: Extracting Backdoor Functionality for Accurate Backdoor Input Detection (ICLR 2024) [paper]
Backdoor Secrets Unveiled: Identifying Backdoor Data with Optimized Scaled Prediction Consistency (ICLR 2024) [paper]
Adversarial Feature Map Pruning for Backdoor (ICLR 2024) [paper]
Safe and Robust Watermark Injection with a Single OoD Image (ICLR 2024) [paper]
Efficient Backdoor Attacks for Deep Neural Networks in Real-world Scenarios (ICLR 2024) [paper]
Backdoor Contrastive Learning via Bi-level Trigger Optimization (ICLR 2024) [paper]
BadEdit: Backdooring Large Language Models by Model Editing (ICLR 2024) [paper]
Backdoor Federated Learning by Poisoning Backdoor-Critical Layers (ICLR 2024) [paper]
Poisoned Forgery Face: Towards Backdoor Attacks on Face Forgery Detection (ICLR 2024) [paper]
Influencer Backdoor Attack on Semantic Segmentation (ICLR 2024) [paper]
Rethinking Backdoor Attacks on Dataset Distillation: A Kernel Method Perspective (ICLR 2024) [paper]
Universal Backdoor Attacks (ICLR 2024) [paper]
Demystifying Poisoning Backdoor Attacks from a Statistical Perspective (ICLR 2024) [paper]
BadChain: Backdoor Chain-of-Thought Prompting for Large Language Models (ICLR 2024) [paper]
Rethinking CNN’s Generalization to Backdoor Attack from Frequency Domain (ICLR 2024) [paper]
Like Oil and Water: Group Robustness Methods and Poisoning Defenses Don't Mix (ICLR 2024) [paper]
VDC: Versatile Data Cleanser for Detecting Dirty Samples via Visual-Linguistic Inconsistency (ICLR 2024) [paper]
Chameleon: Increasing Label-Only Membership Leakage with Adaptive Poisoning (ICLR 2024) [paper]
Universal Jailbreak Backdoors from Poisoned Human Feedback (ICLR 2024) [paper]
Teach LLMs to Phish: Stealing Private Information from Language Models (ICLR 2024) [paper]

</details> <details> <summary>S&P</summary>

Poisoning Web-Scale Training Datasets is Practical (S&P 2024) [paper]
TrojanPuzzle: Covertly Poisoning Code-Suggestion Models (S&P 2024) [paper] [code]
FLShield: A Validation Based Federated Learning Framework to Defend Against Poisoning Attacks (S&P 2024) [paper] [code]
Poisoned ChatGPT Finds Work for Idle Hands: Exploring Developers' Coding Practices with Insecure Suggestions from Poisoned AI Models (S&P 2024) [paper]
FlowMur: A Stealthy and Practical Audio Backdoor Attack with Limited Knowledge (S&P 2024) [paper] [code]
Robust Backdoor Detection for Deep Learning via Topological Evolution Dynamics (S&P 2024) [paper] [code]
ODSCAN: Backdoor Scanning for Object Detection Models (S&P 2024) [paper] [code]
Nightshade: Prompt-Specific Poisoning Attacks on Text-to-Image Generative Models (S&P 2024) [paper] [code]
SHERPA: Explainable Robust Algorithms for Privacy-Preserved Federated Learning in Future Networks to Defend Against Data Poisoning Attacks (S&P 2024) [paper]
BAFFLE: Hiding Backdoors in Offline Reinforcement Learning Datasets (S&P 2024) [paper] [code]
DeepVenom: Persistent DNN Backdoors Exploiting Transient Weight Perturbations in Memories (S&P 2024)
Need for Speed: Taming Backdoor Attacks with Speed and Precision (S&P 2024)
Exploring the Orthogonality and Linearity of Backdoor Attacks (S&P 2024)
BELT: Old-School Backdoor Attacks can Evade the State-of-the-Art Defense with Backdoor Exclusivity Lifting (S&P 2024) [paper] [code]
Test-Time Poisoning Attacks Against Test-Time Adaptation Models (S&P 2024) [paper] [code]
MAWSEO: Adversarial Wiki Search Poisoning for Illicit Online Promotion (S&P 2024) [paper]
MM-BD: Post-Training Detection of Backdoor Attacks with Arbitrary Backdoor Pattern Types Using a Maximum Margin Statistic (S&P 2024) [paper] [code]
BadVFL: Backdoor Attacks in Vertical Federated Learning (S&P 2024) [paper]
Backdooring Multimodal Learning (S&P 2024) [paper] [code]
Distribution Preserving Backdoor Attack in Self-supervised Learning (S&P 2024) [paper] [code]

</details> <details> <summary>CVPR</summary>

Data Poisoning based Backdoor Attacks to Contrastive Learning (CVPR 2024) [paper] [code]
Adversarial Backdoor Attack by Naturalistic Data Poisoning on Trajectory Prediction in Autonomous Driving (CVPR 2024) [paper]
Semantic Shield: Defending Vision-Language Models Against Backdooring and Poisoning via Fine-grained Knowledge Alignment (CVPR 2024)
BrainWash: A Poisoning Attack to Forget in Continual Learning (CVPR 2024) [paper]
Not All Prompts Are Secure: A Switchable Backdoor Attack against Pre-trained Models (CVPR 2024) [code]
Test-Time Backdoor Defense via Detecting and Repairing (CVPR 2024) [paper]
Nearest Is Not Dearest: Towards Practical Defense against Quantization-conditioned Backdoor Attacks (CVPR 2024) [code]
LOTUS: Evasive and Resilient Backdoor Attacks through Sub-Partitioning (CVPR 2024) [paper] [code]
Temperature-based Backdoor Attacks on Thermal Infrared Object Detection (CVPR 2024)
BadCLIP: Dual-Embedding Guided Backdoor Attack on Multimodal Contrastive Learning (CVPR 2024) [paper]
Re-thinking Data Availablity Attacks Against Deep Neural Networks (CVPR 2024) [paper]

</details> <details> <summary>NAACL</summary>

From Shortcuts to Triggers: Backdoor Defense with Denoised PoE (NAACL 2024) [paper] [code]
Two Heads are Better than One: Nested PoE for Robust Defense Against Multi-Backdoors (NAACL 2024) [paper] [code]
ChatGPT as an Attack Tool: Stealthy Textual Backdoor Attack via Blackbox Generative Model Trigger (NAACL 2024) [paper]
Instructions as Backdoors: Backdoor Vulnerabilities of Instruction Tuning for Large Language Models (NAACL 2024) [paper]
PromptFix: Few-shot Backdoor Removal via Adversarial Prompt Tuning (NAACL 2024)
Backdoor Attacks on Multilingual Machine Translation (NAACL 2024) [paper]
Stealthy and Persistent Unalignment on Large Language Models via Backdoor Injections (NAACL 2024) [paper]
Backdooring Instruction-Tuned Large Language Models with Virtual Prompt Injection (NAACL 2024) [paper] [code]
Composite Backdoor Attacks Against Large Language Models (NAACL 2024 Findings) [paper] [code]
Task-Agnostic Detector for Insertion-Based Backdoor Attacks (NAACL 2024 Findings) [paper]
Defending Against Weight-Poisoning Backdoor Attacks for Parameter-Efficient Fine-Tuning (NAACL 2024 Findings) [paper]

</details> <details> <summary>ICML</summary>

TERD: A Unified Framework for Backdoor Defense on Diffusion Model (ICML 2024)
Purifying Quantization-conditioned Backdoors via Layer-wise Activation Correction with Distribution Approximation (ICML 2024)
Energy-based Backdoor Defense without Task-Specific Samples and Model Retraining (ICML 2024)
IBD-PSC: Input-level Backdoor Detection via Parameter-oriented Scaling Consistency (ICML 2024)
A Theoretical Analysis of Backdoor Poisoning Attacks in Convolutional Neural Networks (ICML 2024)
SHINE: Shielding Backdoors in Deep Reinforcement Learning (ICML 2024)
Better Safe than Sorry: Pre-training CLIP against Targeted Data Poisoning and Backdoor Attacks (ICML 2024) [paper]
Generalization Bound and New Algorithm for Clean-Label Backdoor Attack (ICML 2024)
Privacy Backdoors: Stealing Data with Corrupted Pretrained Models (ICML 2024) [paper]
Defense against Backdoor Attack on Pre-trained Language Models via Head Pruning and Attention Normalization (ICML 2024)
Causality Based Front-door Denfence Against Backdoor Attack on Language Model (ICML 2024)
The Stronger the Diffusion Model, the Easier the Backdoor: Data Poisoning to Induce Copyright BreachesWithout Adjusting Finetuning Pipeline (ICML 2024) [paper] [code]
Perfect Alignment May be Poisonous to Graph Contrastive Learning (ICML 2024) [paper]
FedREDefense: Defending against Model Poisoning Attacks for Federated Learning using Model Update Reconstruction Error (ICML 2024)
Naive Bayes Classifiers over Missing Data: Decision and Poisoning (ICML 2024)
Data Poisoning Attacks against Conformal Prediction (ICML 2024)

</details> <details> <summary>ACL</summary>

RLHFPoison: Reward Poisoning Attack for Reinforcement Learning with Human Feedback in Large Language Models (ACL 2024) [paper]
Acquiring Clean Language Models from Backdoor Poisoned Datasets by Downscaling Frequency Space (ACL 2024) [paper] [code]
BadAgent: Inserting and Activating Backdoor Attacks in LLM Agents (ACL 2024) [paper] [code]
WARDEN: Multi-Directional Backdoor Watermarks for Embedding-as-a-Service Copyright Protection (ACL 2024) [paper]
BadActs: A Universal Backdoor Defense in the Activation Space (ACL 2024 Findings) [paper] [code]
UOR: Universal Backdoor Attacks on Pre-trained Language Models (ACL 2024 Findings) [paper]
Here's a Free Lunch: Sanitizing Backdoored Models with Model Merge (ACL 2024 Findings) [paper] [code]

</details>

2023

<details> <summary>arXiv</summary>

Silent Killer: Optimizing Backdoor Trigger Yields a Stealthy and Powerful Data Poisoning Attack (arXiv 2023) [code]
Exploring the Limits of Indiscriminate Data Poisoning Attacks (arXiv 2023) [paper]
Students Parrot Their Teachers: Membership Inference on Model Distillation (arXiv 2023) [paper]
More than you've asked for: A Comprehensive Analysis of Novel Prompt Injection Threats to Application-Integrated Large Language Models (arXiv 2023) [paper] [code]
Feature Partition Aggregation: A Fast Certified Defense Against a Union of Sparse Adversarial Attacks (arXiv 2023) [paper] [code]
ASSET: Robust Backdoor Data Detection Across a Multiplicity of Deep Learning Paradigms (arXiv 2023) [paper] [code]
Temporal Robustness against Data Poisoning (arXiv 2023) [paper]
A Systematic Evaluation of Backdoor Trigger Characteristics in Image Classification (arXiv 2023) [paper]
Learning the Unlearnable: Adversarial Augmentations Suppress Unlearnable Example Attacks (arXiv 2023) [paper] [code]
Backdoor Attacks with Input-unique Triggers in NLP (arXiv 2023) [paper]
Do Backdoors Assist Membership Inference Attacks? (arXiv 2023) [paper]
Black-box Backdoor Defense via Zero-shot Image Purification (arXiv 2023) [paper]
Influencer Backdoor Attack on Semantic Segmentation (arXiv 2023) [paper]
TrojViT: Trojan Insertion in Vision Transformers (arXiv 2023) [paper]
Mole Recruitment: Poisoning of Image Classifiers via Selective Batch Sampling (arXiv 2023) [paper] [code]
Poisoning Web-Scale Training Datasets is Practical (arXiv 2023) [paper]
Enhancing Fine-Tuning Based Backdoor Defense with Sharpness-Aware Minimization (arXiv 2023) [paper]
MAWSEO: Adversarial Wiki Search Poisoning for Illicit Online Promotion (arXiv 2023) [paper]
Launching a Robust Backdoor Attack under Capability Constrained Scenarios (arXiv 2023) [paper]
Certifiable Robustness for Naive Bayes Classifiers (arXiv 2023) [paper] [code]
Assessing Vulnerabilities of Adversarial Learning Algorithm through Poisoning Attacks (arXiv 2023) [paper] [code]
Prompt as Triggers for Backdoor Attack: Examining the Vulnerability in Language Models (arXiv 2023) [paper] [code]
Text-to-Image Diffusion Models can be Easily Backdoored through Multimodal Data Poisoning (arXiv 2023) [paper]
BadSAM: Exploring Security Vulnerabilities of SAM via Backdoor Attacks (arXiv 2023) [paper]
Backdoor Learning on Sequence to Sequence Models (arXiv 2023) [paper]
ChatGPT as an Attack Tool: Stealthy Textual Backdoor Attack via Blackbox Generative Model Trigger (arXiv 2023) [paper]
Evil from Within: Machine Learning Backdoors through Hardware Trojans (arXiv 2023) [paper]

</details> <details> <summary>ICLR</summary>

Indiscriminate Poisoning Attacks on Unsupervised Contrastive Learning (ICLR 2023) [paper]
Clean-image Backdoor: Attacking Multi-label Models with Poisoned Labels Only (ICLR 2023) [paper]
TrojText: Test-time Invisible Textual Trojan Insertion (ICLR 2023) [paper] [code]
Is Adversarial Training Really a Silver Bullet for Mitigating Data Poisoning? (ICLR 2023) [paper] [code]
Incompatibility Clustering as a Defense Against Backdoor Poisoning Attacks (ICLR 2023) [paper] [code]
Revisiting the Assumption of Latent Separability for Backdoor Defenses (ICLR 2023) [paper] [code]
Few-shot Backdoor Attacks via Neural Tangent Kernels (ICLR 2023) [paper] [code]
SCALE-UP: An Efficient Black-box Input-level Backdoor Detection via Analyzing Scaled Prediction Consistency (ICLR 2023) [paper] [code]
Revisiting Graph Adversarial Attack and Defense From a Data Distribution Perspective (ICLR 2023) [paper] [code]
Provable Robustness against Wasserstein Distribution Shifts via Input Randomization (ICLR 2023) [paper]
Don’t forget the nullspace! Nullspace occupancy as a mechanism for out of distribution failure (ICLR 2023) [paper]
Self-Ensemble Protection: Training Checkpoints Are Good Data Protectors (ICLR 2023) [paper] [code]
Towards Robustness Certification Against Universal Perturbations (ICLR 2023) [paper] [code]
Understanding Influence Functions and Datamodels via Harmonic Analysis (ICLR 2023) [paper]
Distilling Cognitive Backdoor Patterns within an Image (ICLR 2023) [paper] [code]
FLIP: A Provable Defense Framework for Backdoor Mitigation in Federated Learning (ICLR 2023) [paper] [code]
UNICORN: A Unified Backdoor Trigger Inversion Framework (ICLR 2023) [paper] [code]

</details> <details> <summary>ICML</summary>

Poisoning Language Models During Instruction Tuning (ICML 2023) [paper] [code]
Chameleon: Adapting to Peer Images for Planting Durable Backdoors in Federated Learning (ICML 2023) [paper] [code]
Image Shortcut Squeezing: Countering Perturbative Availability Poisons with Compression (ICML 2023) [paper] [code]
Poisoning Generative Replay in Continual Learning to Promote Forgetting (ICML 2023) [paper] [code]
Exploring Model Dynamics for Accumulative Poisoning Discovery (ICML 2023) [paper] [code]
Data Poisoning Attacks Against Multimodal Encoders (ICML 2023) [paper] [code]
Exploring the Limits of Model-Targeted Indiscriminate Data Poisoning Attacks (ICML 2023) [paper] [code]
Run-Off Election: Improved Provable Defense against Data Poisoning Attacks (ICML 2023) [paper] [code]
Revisiting Data-Free Knowledge Distillation with Poisoned Teachers (ICML 2023) [paper] [code]
Certified Robust Neural Networks: Generalization and Corruption Resistance (ICML 2023) [paper] [code]
Understanding Backdoor Attacks through the Adaptability Hypothesis (ICML 2023) [paper]
Robust Collaborative Learning with Linear Gradient Overhead (ICML 2023) [paper] [code]
Graph Contrastive Backdoor Attacks (ICML 2023) [paper]
Reconstructive Neuron Pruning for Backdoor Defense (ICML 2023) [paper] [code]
Rethinking Backdoor Attacks (ICML 2023) [paper]
UMD: Unsupervised Model Detection for X2X Backdoor Attacks (ICML 2023) [paper]
LeadFL: Client Self-Defense against Model Poisoning in Federated Learning (ICML 2023) [paper] [code]

</details> <details> <summary>NeurIPS</summary>

BadTrack: A Poison-Only Backdoor Attack on Visual Object Tracking (NeurIPS 2023) [paper]
ParaFuzz: An Interpretability-Driven Technique for Detecting Poisoned Samples in NLP (NeurIPS 2023) [paper]
Robust Contrastive Language-Image Pretraining against Data Poisoning and Backdoor Attacks (NeurIPS 2023) [paper] [code]
Neural Polarizer: A Lightweight and Effective Backdoor Defense via Purifying Poisoned Features (NeurIPS 2023) [paper] [PyTorch code] [MindSpore Code]
What Distributions are Robust to Indiscriminate Poisoning Attacks for Linear Learners? (NeurIPS 2023) [paper]
Label Poisoning is All You Need (NeurIPS 2023) [paper]
Hidden Poison: Machine Unlearning Enables Camouflaged Poisoning Attacks (NeurIPS 2023) [paper] [code]
Temporal Robustness against Data Poisoning (NeurIPS 2023) [paper]
VillanDiffusion: A Unified Backdoor Attack Framework for Diffusion Models (NeurIPS 2023) [paper] [code]
CBD: A Certified Backdoor Detector Based on Local Dominant Probability (NeurIPS 2023) [paper]
BIRD: Generalizable Backdoor Detection and Removal for Deep Reinforcement Learning (NeurIPS 2023) [paper]
Fed-FA: Theoretically Modeling Client Data Divergence for Federated Language Backdoor Defense (NeurIPS 2023) [paper]
Shared Adversarial Unlearning: Backdoor Mitigation by Unlearning Shared Adversarial Examples (NeurIPS 2023) [paper] [PyTorch code] [MindSpore Code]
IBA: Towards Irreversible Backdoor Attacks in Federated Learning (NeurIPS 2023) [paper] [code]
Towards Stable Backdoor Purification through Feature Shift Tuning (NeurIPS 2023) [paper] [code]
Defending Pre-trained Language Models as Few-shot Learners against Backdoor Attacks (NeurIPS 2023) [paper] [code]
Lockdown: Backdoor Defense for Federated Learning with Isolated Subspace Training (NeurIPS 2023) [paper] [code]
A3FL: Adversarially Adaptive Backdoor Attacks to Federated Learning (NeurIPS 2023) [paper] [code]
FedGame: A Game-Theoretic Defense against Backdoor Attacks in Federated Learning (NeurIPS 2023) [paper] [code]
A Unified Detection Framework for Inference-Stage Backdoor Defenses (NeurIPS 2023) [paper]
Black-box Backdoor Defense via Zero-shot Image Purification (NeurIPS 2023) [paper] [code]
Setting the Trap: Capturing and Defeating Backdoors in Pretrained Language Models through Honeypots (NeurIPS 2023) [paper]

</details> <details> <summary>CVPR</summary>

Backdoor Defense via Deconfounded Representation Learning (CVPR 2023) [paper] [code]
Turning Strengths into Weaknesses: A Certified Robustness Inspired Attack Framework against Graph Neural Networks (CVPR 2023) [paper]
CUDA: Convolution-based Unlearnable Datasets (CVPR 2023) [paper] [code]
Backdoor Attacks Against Deep Image Compression via Adaptive Frequency Trigger (CVPR 2023) [paper]
Single Image Backdoor Inversion via Robust Smoothed Classifiers (CVPR 2023) [paper] [code]
Unlearnable Clusters: Towards Label-agnostic Unlearnable Examples (CVPR 2023) [paper] [code]
Backdoor Defense via Adaptively Splitting Poisoned Dataset (CVPR 2023) [paper] [code]
Detecting Backdoors During the Inference Stage Based on Corruption Robustness Consistency (CVPR 2023) [paper] [code]
Defending Against Patch-based Backdoor Attacks on Self-Supervised Learning (CVPR 2023) [paper] [code]
Color Backdoor: A Robust Poisoning Attack in Color Space (CVPR 2023) [paper]
How to Backdoor Diffusion Models? (CVPR 2023) [paper] [code]
Backdoor Cleansing With Unlabeled Data (CVPR 2023) [paper] [code]
MEDIC: Remove Model Backdoors via Importance Driven Cloning (CVPR 2023) [paper] [code]
Architectural Backdoors in Neural Networks (CVPR 2023) [paper]
Detecting Backdoors in Pre-Trained Encoders (CVPR 2023) [paper] [code]
The Dark Side of Dynamic Routing Neural Networks: Towards Efficiency Backdoor Injection (CVPR 2023) [paper] [code]
Progressive Backdoor Erasing via Connecting Backdoor and Adversarial Attacks (CVPR 2023) [paper]
You Are Catching My Attention: Are Vision Transformers Bad Learners Under Backdoor Attacks? (CVPR 2023) [paper]
Don't FREAK Out: A Frequency-Inspired Approach to Detecting Backdoor Poisoned Samples in DNNs (CVPRW 2023) [paper]

</details> <details> <summary>ICCV</summary>

TIJO: Trigger Inversion with Joint Optimization for Defending Multimodal Backdoored Models (ICCV 2023) [paper] [code]
Towards Attack-tolerant Federated Learning via Critical Parameter Analysis (ICCV 2023) [paper] [code]
VertexSerum: Poisoning Graph Neural Networks for Link Inference (ICCV 2023) [paper]
The Victim and The Beneficiary: Exploiting a Poisoned Model to Train a Clean Model on Poisoned Data (ICCV 2023) [paper] [code]
CleanCLIP: Mitigating Data Poisoning Attacks in Multimodal Contrastive Learning (arXiv 2023) [paper] [code]
Enhancing Fine-Tuning Based Backdoor Defense with Sharpness-Aware Minimization (ICCV2023) [paper]
Rickrolling the Artist: Injecting Backdoors into Text Encoders for Text-to-Image Synthesis (ICCV2023) [paper] [code]
Beating Backdoor Attack at Its Own Game (ICCV 2023) [paper] [code]
Multi-Metrics Adaptively Identifies Backdoors in Federated Learning (ICCV2023) [paper] [code]
PolicyCleanse: Backdoor Detection and Mitigation for Competitive Reinforcement Learning (ICCV2023) [paper]
The Perils of Learning from Unlabeled Data: Backdoor Attacks on Semi-Supervised Learning [paper]

</details> <details> <summary>S&P</summary>

Jigsaw Puzzle: Selective Backdoor Attack to Subvert Malware Classifiers (S&P 2023) [paper]
SNAP: Efficient Extraction of Private Properties with Poisoning (S&P 2023) [paper] [code]
BayBFed: Bayesian Backdoor Defense for Federated Learning (S&P 2023) [paper]
RAB: Provable Robustness Against Backdoor Attacks (S&P 2023) [paper]
FedRecover: Recovering from Poisoning Attacks in Federated Learning using Historical Information (S&P 2023) [paper]
3DFed: Adaptive and Extensible Framework for Covert Backdoor Attack in Federated Learning (S&P 2023) [paper]

</details> <details> <summary>ACL</summary>

BITE: Textual Backdoor Attacks with Iterative Trigger Injection (ACL 2023) [paper] [code]
Backdooring Neural Code Search (ACL 2023) [paper] [code]
Are You Copying My Model? Protecting the Copyright of Large Language Models for EaaS via Backdoor Watermark (ACL 2023) [paper] [code]
NOTABLE: Transferable Backdoor Attacks Against Prompt-based NLP Models (ACL 2023) [paper] [code]
Multi-target Backdoor Attacks for Code Pre-trained Models (ACL 2023) [code] [code]
A Gradient Control Method for Backdoor Attacks on Parameter-Efficient Tuning (ACL 2023) [paper]
Defending against Insertion-based Textual Backdoor Attacks via Attribution (ACL 2023) [paper]
Diffusion Theory as a Scalpel: Detecting and Purifying Poisonous Dimensions in Pre-trained Language Models Caused by Backdoor or Bias (ACL 2023) [paper]
Maximum Entropy Loss, the Silver Bullet Targeting Backdoor Attacks in Pre-trained Language Models (ACL Findings 2023) [paper]

</details> <details> <summary>EMNLP</summary>

Mitigating Backdoor Poisoning Attacks through the Lens of Spurious Correlation (EMNLP 2023) [paper] [code]
Prompt as Triggers for Backdoor Attack: Examining the Vulnerability in Language Models (EMNLP 2023) [paper] [code]
Poisoning Retrieval Corpora by Injecting Adversarial Passages (EMNLP 2023) [paper] [code]
UPTON: Preventing Authorship Leakage from Public Text Release via Data Poisoning (EMNLP 2023 Findings) [paper]
Attention-Enhancing Backdoor Attacks Against BERT-based Models (EMNLP 2023 Findings) [paper]
Large Language Models Are Better Adversaries: Exploring Generative Clean-Label Backdoor Attacks Against Text Classifiers (EMNLP 2023 Findings) [paper]

</details> <details> <summary>Others</summary>

RDM-DC: Poisoning Resilient Dataset Condensation with Robust Distribution Matching (UAI 2023) [paper]
Defending Against Backdoor Attacks by Layer-wise Feature Analysis (PAKDD 2023) [paper] [code]
Manipulating Federated Recommender Systems: Poisoning with Synthetic Users and Its Countermeasures (SIGIR 2023) [paper]
The Dark Side of Explanations: Poisoning Recommender Systems with Counterfactual Examples (SIGIR 2023) [paper]
PatchBackdoor: Backdoor Attack against Deep Neural Networks without Model Modification (ACM MM 2023) [paper] [code]
A Dual Stealthy Backdoor: From Both Spatial and Frequency Perspectives (ACM MM 2023) [paper]
How to Sift Out a Clean Data Subset in the Presence of Data Poisoning? (USENIX Security 2023) [paper] [code]
PORE: Provably Robust Recommender Systems against Data Poisoning Attacks (USENIX Security 2023) [paper]
On the Security Risks of Knowledge Graph Reasoning (USENIX Security 2023) [paper] [code]
Fedward: Flexible Federated Backdoor Defense Framework with Non-IID Data (ICME 2023) [paper]
BadGPT: Exploring Security Vulnerabilities of ChatGPT via Backdoor Attacks to InstructGPT (NDSS 2023) [paper]
Exploiting Logic Locking for a Neural Trojan Attack on Machine Learning Accelerators (GLSVLSI 2023) [paper]
Energy-Latency Attacks to On-Device Neural Networks via Sponge Poisoning (SecTL 2023) [paper]
Beyond the Model: Data Pre-processing Attack to Deep Learning Models in Android Apps (SecTL 2023) [paper]

</details>

2022

Transferable Unlearnable Examples (arXiv 2022) [paper]
Natural Backdoor Datasets (arXiv 2022) [paper]
Dangerous Cloaking: Natural Trigger based Backdoor Attacks on Object Detectors in the Physical World (arXiv 2022) [paper]
Backdoor Attacks on Self-Supervised Learning (CVPR 2022) [paper] [code]
Poisons that are learned faster are more effective (CVPR 2022 Workshops) [paper]
Robust Unlearnable Examples: Protecting Data Privacy Against Adversarial Learning (ICLR 2022) [paper] [code]
Adversarial Unlearning of Backdoors via Implicit Hypergradient (ICLR 2022) [paper] [code]
Not All Poisons are Created Equal: Robust Training against Data Poisoning (ICML 2022) [paper] [code]
Sleeper Agent: Scalable Hidden Trigger Backdoors for Neural Networks Trained from Scratch (NeurIPS 2022) [paper] [code]
Policy Resilience to Environment Poisoning Attacks on Reinforcement Learning (NeurIPS 2022 Workshop MLSW) [paper]
Hard to Forget: Poisoning Attacks on Certified Machine Unlearning (AAAI 2022) [paper] [code]
Certified Robustness of Nearest Neighbors against Data Poisoning and Backdoor Attacks (AAAI 2022) [paper]
PoisonedEncoder: Poisoning the Unlabeled Pre-training Data in Contrastive Learning (USENIX Security 2022) [paper]
Planting Undetectable Backdoors in Machine Learning Models (FOCS 2022) [paper]

2021

DP-InstaHide: Provably Defusing Poisoning and Backdoor Attacks with Differentially Private Data Augmentations (arXiv 2021) [paper]
How Robust Are Randomized Smoothing Based Defenses to Data Poisoning? (CVPR 2021) [paper]
Preventing Unauthorized Use of Proprietary Data: Poisoning for Secure Dataset Release (ICLR 2021 Workshop on Security and Safety in Machine Learning Systems) [paper]
Witches' Brew: Industrial Scale Data Poisoning via Gradient Matching (ICLR 2021) [paper] [code]
Unlearnable Examples: Making Personal Data Unexploitable (ICLR 2021) [paper] [code]
Neural Attention Distillation: Erasing Backdoor Triggers from Deep Neural Networks (ICLR 2021) [paper] [code]
LowKey: Leveraging Adversarial Attacks to Protect Social Media Users from Facial Recognition (ICLR 2021) [paper]
What Doesn't Kill You Makes You Robust(er): How to Adversarially Train against Data Poisoning (ICLR 2021 Workshop) [paper]
Neural Tangent Generalization Attacks (ICML 2021) [paper]
SPECTRE: Defending Against Backdoor Attacks Using Robust Covariance Estimation (ICML 2021) [paper]
Adversarial Examples Make Strong Poisons (NeurIPS 2021) [paper]
Anti-Backdoor Learning: Training Clean Models on Poisoned Data (NeurIPS 2021) [paper] [code]
Rethinking the Backdoor Attacks' Triggers: A Frequency Perspective (ICCV 2021) [paper] [code]
Intrinsic Certified Robustness of Bagging against Data Poisoning Attacks (AAAI 2021) [paper] [code]
Strong Data Augmentation Sanitizes Poisoning and Backdoor Attacks Without an Accuracy Tradeoff (ICASSP 2021) [paper]

2020

On the Effectiveness of Mitigating Data Poisoning Attacks with Gradient Shaping (arXiv 2020) [paper] [code]
Backdooring and poisoning neural networks with image-scaling attacks (arXiv 2020) [paper]
Poisoned classifiers are not only backdoored, they are fundamentally broken (arXiv 2020) [paper] [code]
Invisible backdoor attacks on deep neural networks via steganography and regularization (TDSC 2020) [paper]
Universal Litmus Patterns: Revealing Backdoor Attacks in CNNs (CVPR 2020) [paper] [code]
MetaPoison: Practical General-purpose Clean-label Data Poisoning (NeurIPS 2020) [paper]
Input-Aware Dynamic Backdoor Attack (NeurIPS 2020) [paper] [code]
How To Backdoor Federated Learning (AISTATS 2020) [paper]
Reflection backdoor: A natural backdoor attack on deep neural networks (ECCV 2020) [paper]
Practical Poisoning Attacks on Neural Networks (ECCV 2020) [paper]
Practical Detection of Trojan Neural Networks: Data-Limited and Data-Free Cases (ECCV 2020) [paper] [code]
Deep k-NN Defense Against Clean-Label Data Poisoning Attacks (ECCV 2020 Workshops) [paper] [code]
Radioactive data: tracing through training (ICML 2020) [paper]
Reliable Evaluation of Adversarial Robustness with an Ensemble of Diverse Parameter-free Attacks (ICML 2020) [paper]
Certified Robustness to Label-Flipping Attacks via Randomized Smoothing (ICML 2020) [paper]
An Embarrassingly Simple Approach for Trojan Attack in Deep Neural Networks (KDD 2020) [paper] [code]
Hidden Trigger Backdoor Attacks (AAAI 2020) [paper] [code]

2019

Label-consistent backdoor attacks (arXiv 2019) [paper]
Poisoning Attacks with Generative Adversarial Nets (arXiv 2019) [paper]
TABOR: A Highly Accurate Approach to Inspecting and Restoring Trojan Backdoors in AI Systems (arXiv 2019) [paper]
BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain (IEEE Access 2019) [paper]
Data Poisoning against Differentially-Private Learners: Attacks and Defenses (IJCAI 2019) [paper]
DeepInspect: A Black-box Trojan Detection and Mitigation Framework for Deep Neural Networks (IJCAI 2019) [paper]
Sever: A Robust Meta-Algorithm for Stochastic Optimization (ICML 2019) [paper]
Learning with Bad Training Data via Iterative Trimmed Loss Minimization (ICML 2019) [paper]
Universal Multi-Party Poisoning Attacks (ICML 2019) [paper]
Transferable Clean-Label Poisoning Attacks on Deep Neural Nets (ICML 2019) [paper]
Defending Neural Backdoors via Generative Distribution Modeling (NeurIPS 2019) [paper]
Learning to Confuse: Generating Training Time Adversarial Data with Auto-Encoder (NeurIPS 2019) [paper]
The Curse of Concentration in Robust Learning: Evasion and Poisoning Attacks from Concentration of Measure (AAAI 2019) [paper]
Backdoor Attacks against Transfer Learning with Pre-trained Deep Learning Models (IEEE Transactions on Services Computing 2019) [paper]
Neural Cleanse: Identifying and Mitigating Backdoor Attacks in Neural Networks (IEEE Symposium on Security and Privacy 2019) [paper]
STRIP: a defence against trojan attacks on deep neural networks (ACSAC 2019) [paper]

2018

Detecting Backdoor Attacks on Deep Neural Networks by Activation Clustering (arXiv 2018) [paper]
Spectral Signatures in Backdoor Attacks (NeurIPS 2018) [paper]
Poison Frogs! Targeted Clean-Label Poisoning Attacks on Neural Networks (NeurIPS 2018) [paper]
Using Trusted Data to Train Deep Networks on Labels Corrupted by Severe Noise (NeurIPS 2018) [paper]
Trojaning Attack on Neural Networks (NDSS 2018) [paper]
Label Sanitization Against Label Flipping Poisoning Attacks (ECML PKDD 2018 Workshops) [paper]
Turning Your Weakness Into a Strength: Watermarking Deep Neural Networks by Backdooring (USENIX Security 2018) [paper]

2017

Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning (arXiv 2017) [paper]
Generative Poisoning Attack Method Against Neural Networks (arXiv 2017) [paper]
Delving into Transferable Adversarial Examples and Black-box Attacks (ICLR 2017) [paper]
Understanding Black-box Predictions via Influence Functions (ICML 2017) [paper] [code]
Certified Defenses for Data Poisoning Attacks (NeurIPS 2017) [paper]

2016

Data Poisoning Attacks on Factorization-Based Collaborative Filtering (NeurIPS 2016) [paper]

2015

Is Feature Selection Secure against Training Data Poisoning? (ICML 2015) [paper]
Using Machine Teaching to Identify Optimal Training-Set Attacks on Machine Learners (AAAI 2015) [paper]