Awesome

Awesome Attacks on Machine Learning Privacy

This repository contains a curated list of papers related to privacy attacks against machine learning. A code repository is provided when available by the authors. For corrections, suggestions, or missing papers, please either open an issue or submit a pull request.

Contents

Awesome Attacks on Machine Learning Privacy
Contents
Surveys and Overviews
Privacy Testing Tools
Papers and Code
Other

Surveys and Overviews

SoK: Model Inversion Attack Landscape: Taxonomy, Challenges, and Future Roadmap (Sayanton Dibbo, 2023)
A Survey of Privacy Attacks in Machine Learning (Rigaki and Garcia, 2023)
An Overview of Privacy in Machine Learning (De Cristofaro, 2020)
Rethinking Privacy Preserving Deep Learning: How to Evaluate and Thwart Privacy Attacks (Fan et al., 2020)
Privacy and Security Issues in Deep Learning: A Survey (Liu et al., 2021)
ML-Doctor: Holistic Risk Assessment of Inference Attacks Against Machine Learning Models (Liu et al., 2021)
Membership Inference Attacks on Machine Learning: A Survey (Hu et al., 2021)
Survey: Leakage and Privacy at Inference Time (Jegorova et al., 2021)
A Review of Confidentiality Threats Against Embedded Neural Network Models (Joud et al., 2021)
Federated Learning Attacks Revisited: A Critical Discussion of Gaps,Assumptions, and Evaluation Setups (Wainakh et al., 2021)
I Know What You Trained Last Summer: A Survey on Stealing Machine Learning Models and Defences (Oliynyk et al., 2022)

Privacy Testing Tools

PrivacyRaven (Trail of Bits)
TensorFlow Privacy (TensorFlow)
Machine Learning Privacy Meter (NUS Data Privacy and Trustworthy Machine Learning Lab)
CypherCat (archive-only) (IQT Labs/Lab 41)
Adversarial Robustness Toolbox (ART) (IBM)

Papers and Code

Membership inference

A curated list of membership inference papers (more than 100 papers) on machine learning models is available at this repository.

Membership inference attacks against machine learning models (Shokri et al., 2017) (code)
Understanding membership inferences on well-generalized learning models(Long et al., 2018)
Privacy risk in machine learning: Analyzing the connection to overfitting, (Yeom et al., 2018) (code)
Membership inference attack against differentially private deep learning model (Rahman et al., 2018)
Comprehensive privacy analysis of deep learning: Passive and active white-box inference attacks against centralized and federated learning. (Nasr et al., 2019) (code)
Logan: Membership inference attacks against generative models. (Hayes et al. 2019) (code)
Evaluating differentially private machine learning in practice (Jayaraman and Evans, 2019) (code)
Ml-leaks: Model and data independent membership inference attacks and defenses on machine learning models (Salem et al., 2019) (code)
Privacy risks of securing machine learning models against adversarial examples (Song L. et al., 2019) (code)
White-box vs Black-box: Bayes Optimal Strategies for Membership Inference (Sablayrolles et al., 2019)
Privacy risks of explaining machine learning models (Shokri et al., 2019)
Demystifying membership inference attacks in machine learning as a service (Truex et al., 2019)
Monte carlo and reconstruction membership inference attacks against generative models (Hilprecht et al., 2019)
MemGuard: Defending against Black-Box Membership Inference Attacks via Adversarial Examples (Jia et al., 2019) (code)
Gan-leaks: A taxonomy of membership inference attacks against gans (Chen,et al., 2019))
Auditing Data Provenance in Text-Generation Models (Song and Shmatikov, 2019)
Membership Inference Attacks on Sequence-to-Sequence Models: Is My Data In Your Machine Translation System? (Hisamoto et al., 2020)
Revisiting Membership Inference Under Realistic Assumptions (Jayaraman et al., 2020)
When Machine Unlearning Jeopardizes Privacy (Chen et al., 2020)
Modelling and Quantifying Membership Information Leakage in Machine Learning (Farokhi and Kaafar, 2020)
Systematic Evaluation of Privacy Risks of Machine Learning Models (Song and Mittal, 2020) (code)
Towards the Infeasibility of Membership Inference on Deep Models (Rezaei and Liu, 2020) (code)
Stolen Memories: Leveraging Model Memorization for Calibrated White-Box Membership Inference (Leino and Fredrikson, 2020)
Label-Only Membership Inference Attacks (Choquette Choo et al., 2020)
Label-Leaks: Membership Inference Attack with Label (Li and Zhang, 2020)
Alleviating Privacy Attacks via Causal Learning (Tople et al., 2020)
On the Effectiveness of Regularization Against Membership Inference Attacks (Kaya et al., 2020)
Sampling Attacks: Amplification of Membership Inference Attacks by Repeated Queries (Rahimian et al., 2020)
Segmentations-Leak: Membership Inference Attacks and Defenses in Semantic Image Segmentation (He et al., 2019)
Differential Privacy Defenses and Sampling Attacks for Membership Inference (Rahimian et al., 2019)
privGAN: Protecting GANs from membership inference attacks at low cost (Mukherjee et al., 2020)
Sharing Models or Coresets: A Study based on Membership Inference Attack (Lu et al., 2020)
Privacy Analysis of Deep Learning in the Wild: Membership Inference Attacks against Transfer Learning (Zou et al., 2020)
Quantifying Membership Inference Vulnerability via Generalization Gap and Other Model Metrics (Bentley et al., 2020)
MACE: A Flexible Framework for Membership Privacy Estimation in Generative Models (Liu et al., 2020)
On Primes, Log-Loss Scores and (No) Privacy (Aggarwal et al., 2020)
MCMIA: Model Compression Against Membership Inference Attack in Deep Neural Networks (Wang et al., 2020)
Bootstrap Aggregation for Point-based Generalized Membership Inference Attacks (Felps et al., 2020)
Differentially Private Learning Does Not Bound Membership Inference (Humphries et al., 2020)
Quantifying Membership Privacy via Information Leakage (Saeidian et al., 2020)
Disparate Vulnerability: on the Unfairness of Privacy Attacks Against Machine Learning (Yaghini et al., 2020)
Use the Spear as a Shield: A Novel Adversarial Example based Privacy-Preserving Technique against Membership Inference Attacks (Xue et al., 2020)
Towards Realistic Membership Inferences: The Case of Survey Data
Unexpected Information Leakage of Differential Privacy Due to Linear Property of Queries (Huang et al., 2020)
TransMIA: Membership Inference Attacks Using Transfer Shadow Training (Hidano et al., 2020)
An Extension of Fano's Inequality for Characterizing Model Susceptibility to Membership Inference Attacks (Jha et al., 2020)
Adversary Instantiation: Lower Bounds for Differentially Private Machine Learning (Nasr et al., 2021)
Membership Inference Attack with Multi-Grade Service Models in Edge Intelligence (Wang et al., 2021)
Reconstruction-Based Membership Inference Attacks are Easier on Difficult Problems (Shafran et al., 2021)
Membership Inference Attacks on Deep Regression Models for Neuroimaging (Gupta et al., 2021)
Node-Level Membership Inference Attacks Against Graph Neural Networks (He et al., 2021)
Practical Blind Membership Inference Attack via Differential Comparisons (Hui et al., 2021)
ADePT: Auto-encoder based Differentially Private Text Transformation (Krishna et al., 2021)
Source Inference Attacks in Federated Learning (Hu et al., 2021) (code)
The Influence of Dropout on Membership Inference in Differentially Private Models (Galinkin, 2021)
Membership Inference Attack Susceptibility of Clinical Language Models (Jagannatha et al., 2021)
Membership Inference Attacks on Knowledge Graphs (Wang & Sun, 2021)
When Does Data Augmentation Help With Membership Inference Attacks? (Kaya and Dumitras, 2021)
The Influence of Training Parameters and Architectural Choices on the Vulnerability of Neural Networks to Membership Inference Attacks (Bouanani, 2021)
Membership Inference on Word Embedding and Beyond (Mahloujifar et al., 2021)
TableGAN-MCA: Evaluating Membership Collisions of GAN-Synthesized Tabular Data Releasing (Hu et al., 2021)
Enhanced Membership Inference Attacks against Machine Learning Models (Ye et al., 2021)
Do Not Trust Prediction Scores for Membership Inference Attacks (Hintersdorf et al., 2021)
Membership Inference via Backdooring (Hu et al. 2022)

Reconstruction

Reconstruction attacks cover also attacks known as model inversion and attribute inference.

Privacy in pharmacogenetics: An end-to-end case study of personalized warfarin dosing (Fredrikson et al., 2014)
Model inversion attacks that exploit confidence information and basic countermeasures (Fredrikson et al., 2015) (code)
A methodology for formalizing model-inversion attacks (Wu et al., 2016)
Deep models under the gan: Information leakage from collaborative deep learning (Hitaj et al., 2017)
Machine learning models that remember too much (Song, C. et al., 2017) (code)
Model inversion attacks for prediction systems: Without knowledge of non-sensitive attributes (Hidano et al., 2017)
The secret sharer: Evaluating and testing unintended memorization in neural networks (Carlini et al., 2019)
Deep leakage from gradients (Zhu et al., 2019) (code)
Model inversion attacks against collaborative inference (He et al., 2019) (code)
Beyond Inferring Class Representatives: User-Level Privacy Leakage From Federated Learning (Wang et al., 2019)
Neural network inversion in adversarial setting via background knowledge alignment (Yang et al., 2019)
iDLG: Improved Deep Leakage from Gradients (Zhao et al., 2020) (code)
Privacy Risks of General-Purpose Language Models (Pan et al., 2020)
The secret revealer: generative model-inversion attacks against deep neural networks) (Zhang et al., 2020)
Inverting Gradients - How easy is it to break privacy in federated learning? (Geiping et al., 2020)
GAMIN: An Adversarial Approach to Black-Box Model Inversion (Aivodji et al., 2019)
Trade-offs and Guarantees of Adversarial Representation Learning for Information Obfuscation (Zhao et al., 2020)
Reconstruction of training samples from loss functions (Sannai, 2018)
A Framework for Evaluating Gradient Leakage Attacks in Federated Learning (Wei et al., 2020)
Deep Models Under the GAN: Information Leakage from Collaborative Deep Learning (Hitaj et al., 2017)
Beyond Inferring Class Representatives: User-Level Privacy Leakage From Federated Learning (Wang et al., 2018)
Exploring Image Reconstruction Attack in Deep Learning Computation Offloading (Oh and Lee, 2019)
I Know What You See: Power Side-Channel Attack on Convolutional Neural Network Accelerators (Wei et al., 2019)
Updates-Leak: Data Set Inference and Reconstruction Attacks in Online Learning (Salem et al., 2019)
Illuminating the Dark or how to recover what should not be seen in FE-based classifiers (Carpov et al., 2020)
Evaluation Indicator for Model Inversion Attack (Tanaka et al., 2020)
Understanding Unintended Memorization in Federated Learning (Thakkar et al., 2020)
An Attack-Based Evaluation Method for Differentially Private Learning Against Model Inversion Attack (Park et al., 2019)
Reducing Risk of Model Inversion Using Privacy-Guided Training (Goldsteen et al., 2020)
Robust Transparency Against Model Inversion Attacks (Alufaisan et al., 2020)
Does AI Remember? Neural Networks and the Right to be Forgotten (Graves et al., 2020)
Improving Robustness to Model Inversion Attacks via Mutual Information Regularization (Wang et al., 2020)
SAPAG: A Self-Adaptive Privacy Attack From Gradients (Wang et al., 2020)
Theory-Oriented Deep Leakage from Gradients via Linear Equation Solver (Pan et al., 2020)
Improved Techniques for Model Inversion Attacks (Chen et al., 2020)
Black-box Model Inversion Attribute Inference Attacks on Classification Models (Mehnaz et al., 2020)
Deep Face Recognizer Privacy Attack: Model Inversion Initialization by a Deep Generative Adversarial Data Space Discriminator (Khosravy et al., 2020)
MixCon: Adjusting the Separability of Data Representations for Harder Data Recovery (Li et al., 2020)
Evaluation of Inference Attack Models for Deep Learning on Medical Data (Wu et al., 2020)
FaceLeaks: Inference Attacks against Transfer Learning Models via Black-box Queries (Liew and Takahashi, 2020)
Extracting Training Data from Large Language Models (Carlini et al., 2020)
MIDAS: Model Inversion Defenses Using an Approximate Memory System (Xu et al., 2021)
KART: Privacy Leakage Framework of Language Models Pre-trained with Clinical Records (Nakamura et al., 2020)
Derivation of Constraints from Machine Learning Models and Applications to Security and Privacy (Falaschi et al., 2021)
On the (In)Feasibility of Attribute Inference Attacks on Machine Learning Models (Zhao et al., 2021)
Practical Defences Against Model Inversion Attacks for Split Neural Networks (Titcombe et al., 2021)
R-GAP: Recursive Gradient Attack on Privacy (Zhu and Blaschko, 2021)
Exploiting Explanations for Model Inversion Attacks (Zhao et al., 2021)
SAFELearn: Secure Aggregation for private FEderated Learning (Fereidooni et al., 2021)
Does BERT Pretrained on Clinical Notes Reveal Sensitive Data? (Lehman et al., 2021)
Training Data Leakage Analysis in Language Models (Inan et al., 2021)
Model Fragmentation, Shuffle and Aggregation to Mitigate Model Inversion in Federated Learning (Masude et al., 2021)
PRECODE - A Generic Model Extension to Prevent Deep Gradient Leakage (Scheliga et al., 2021)
On the Importance of Encrypting Deep Features (Ni et al., 2021)
Defending Against Model Inversion Attack by Adversarial Examples (Wen et al., 2021)
See through Gradients: Image Batch Recovery via GradInversion (Yin et al., 2021)
Variational Model Inversion Attacks (Wang et al., 2021)
Reconstructing Training Data with Informed Adversaries (Balle et al., 2022)
Plug & Play Attacks: Towards Robust and Flexible Model Inversion Attacks (Struppek et al., 2022)
Privacy Vulnerability of Split Computing to Data-Free Model Inversion Attacks (Dong et al., 2022)
A Linear Reconstruction Approach for Attribute Inference Attacks against Synthetic Data (Annamalai et al., 2023)
Analysis and Utilization of Hidden Information in Model Inversion Attacks (Zhang et al., 2023) (code)
Text Embeddings Reveal (Almost) As Much As Text(Morris et al., 2023)
On the Inadequacy of Similarity-based Privacy Metrics: Reconstruction Attacks against "Truly Anonymous Synthetic Data" (Ganev and De Cristofaro, 2023)
Model Inversion Attack with Least Information and an In-depth Analysis of its Disparate Vulnerability (Dibbo et al., 2023)

Property inference / Distribution inference

Hacking smart machines with smarter ones: How to extract meaningful data from machine learning classifiers (Ateniese et al., 2015)
Property inference attacks on fully connected neural networks using permutation invariant representations (Ganju et al., 2018)
Exploiting unintended feature leakage in collaborative learning (Melis et al., 2019) (code)
Overlearning Reveals Sensitive Attributes (Song C. et al., 2020) (code)
Subject Property Inference Attack in Collaborative Learning (Xu and Li, 2020)
Property Inference From Poisoning (Chase et al., 2021)
Property Inference Attacks on Convolutional Neural Networks: Influence and Implications of Target Model's Complexity (Parisot et al., 2021)
Honest-but-Curious Nets: Sensitive Attributes of Private Inputs can be Secretly Coded into the Entropy of Classifiers' Outputs (Malekzadeh et al. 2021) (code)
Property Inference Attacks Against GANs (Zhou et al., 2021) (code)
Formalizing and Estimating Distribution Inference Risks (Suri and Evans, 2022) (code)
Dissecting Distribution Inference (Suri et al., 2023) (code)
SNAP: Efficient Extraction of Private Properties with Poisoning (Chaudhari et al., 2023) (code)

Model extraction

Stealing machine learning models via prediction apis (Tramèr et al., 2016) (code)
Stealing hyperparameters in machine learning (Wang B. et al., 2018)
Copycat CNN: Stealing Knowledge by Persuading Confession with Random Non-Labeled Data (Correia-Silva et al., 2018) (code)
Towards reverse-engineering black-box neural networks.(Oh et al., 2018) (code)
Knockoff nets: Stealing functionality of black-box models (Orekondy et al., 2019) (code)
PRADA: protecting against DNN model stealing attacks (Juuti et al., 2019) (code)
Model Reconstruction from Model Explanations (Milli et al., 2019)
Exploring connections between active learning and model extraction (Chandrasekaran et al., 2020)
High Accuracy and High Fidelity Extraction of Neural Networks (Jagielski et al., 2020)
Thieves on Sesame Street! Model Extraction of BERT-based APIs (Krishna et al., 2020) (code)
Cryptanalytic Extraction of Neural Network Models (Carlini et al., 2020)
CloudLeak: Large-Scale Deep Learning Models Stealing Through Adversarial Examples (Yu et al., 2020)
ACTIVETHIEF: Model Extraction Using Active Learning and Unannotated Public Data (Pal et al., 2020) (code)
Efficiently Stealing your Machine Learning Models (Reith et al., 2019)
Extraction of Complex DNN Models: Real Threat or Boogeyman? (Atli et al., 2020)
Stealing Neural Networks via Timing Side Channels (Duddu et al., 2019)
DeepSniffer: A DNN Model Extraction Framework Based on Learning Architectural Hints (Hu et al., 2020) (code)
CSI NN: Reverse Engineering of Neural Network Architectures Through Electromagnetic Side Channel (Batina et al., 2019)
Cache Telepathy: Leveraging Shared Resource Attacks to Learn DNN Architectures (Yan et al., 2020)
How to 0wn NAS in Your Spare Time (Hong et al., 2020) (code)
Security Analysis of Deep Neural Networks Operating in the Presence of Cache Side-Channel Attacks (Hong et al., 2020)
Reverse-Engineering Deep ReLU Networks (Rolnick and Kording, 2020)
Model Extraction Oriented Data Publishing with k-anonymity (Fukuoka et al., 2020)
Hermes Attack: Steal DNN Models with Lossless Inference Accuracy (Zhu et al., 2020)
Model extraction from counterfactual explanations (Aïvodji et al., 2020) (code)
MetaSimulator: Simulating Unknown Target Models for Query-Efficient Black-box Attacks (Chen and Yong, 2020) (code)
Prediction Poisoning: Towards Defenses Against DNN Model Stealing Attacks (Orekondy et al., 2019) (code)
IReEn: Iterative Reverse-Engineering of Black-Box Functions via Neural Program Synthesis (Hajipour et al., 2020)
ES Attack: Model Stealing against Deep Neural Networks without Data Hurdles (Yuan et al., 2020)
Black-Box Ripper: Copying black-box models using generative evolutionary algorithms (Barbalau et al., 2020) (code)
Model Extraction Attacks on Graph Neural Networks: Taxonomy and Realization (Wu et al., 2020)
Model Extraction Attacks and Defenses on Cloud-Based Machine Learning Models (Gong et al., 2020)
Leveraging Extracted Model Adversaries for Improved Black Box Attacks (Nizar and Kobren, 2020)
Differentially Private Machine Learning Model against Model Extraction Attack (Cheng et al., 2020)
Model Extraction Attacks and Defenses on Cloud-Based Machine Learning Models (Gong et al., 2020)
Stealing Neural Network Models through the Scan Chain: A New Threat for ML Hardware (Potluri and Aysu, 2021)
Model Extraction and Defenses on Generative Adversarial Networks (Hu and Pang, 2021)
Protecting Decision Boundary of Machine Learning Model With Differentially Private Perturbation (Zheng et al., 2021)
Special-Purpose Model Extraction Attacks: Stealing Coarse Model with Fewer Queries (Okada et al., 2021)
Model Extraction and Adversarial Transferability, Your BERT is Vulnerable! (He et al., 2021) (code)
Thief, Beware of What Get You There: Towards Understanding Model Extraction Attack (Zhang et al., 2021)
Model Weight Theft With Just Noise Inputs: The Curious Case of the Petulant Attacker (Roberts et al., 2019)
Protecting DNNs from Theft using an Ensemble of Diverse Models (Kariyappa et al., 2021)
Information Laundering for Model Privacy (Wang et al., 2021)
Deep Neural Network Fingerprinting by Conferrable Adversarial Examples (Lukas et al., 2021)
BODAME: Bilevel Optimization for Defense Against Model Extraction (Mori et al., 2021)
Dataset Inference: Ownership Resolution in Machine Learning (Maini et al., 2021)
Good Artists Copy, Great Artists Steal: Model Extraction Attacks Against Image Translation Generative Adversarial Networks (Szyller et al., 2021)
Towards Characterizing Model Extraction Queries and How to Detect Them (Zhang et al., 2021)
Hardness of Samples Is All You Need: Protecting Deep Learning Models Using Hardness of Samples (Sadeghzadeh et al., 2021)
Stateful Detection of Model Extraction Attacks (Pal et al., 2021)
MEGEX: Data-Free Model Extraction Attack against Gradient-Based Explainable AI (Miura et al., 2021)
INVERSENET: Augmenting Model Extraction Attacks with Training Data Inversion (Gong et al., 2021)
Increasing the Cost of Model Extraction with Calibrated Proof of Work (Dziedzic et al. 2022) code
On the Difficulty of Defending Self-Supervised Learning against Model Extraction (Dziedzic et al., 2022) code
Dataset Inference for Self-Supervised Models (Dziedzic et al., 2022) code
Can't Steal? Cont-Steal! Contrastive Stealing Attacks Against Image Encoders (Sha et al., 2022)
StolenEncoder: Stealing Pre-trained Encoders (Liu et al., 2022)
Model Extraction Attacks Revisited (Liang et al., 2023)

Other

Prompts Should not be Seen as Secrets: Systematically Measuring Prompt Extraction Attack Success(Zhang et al., 2023)
Amnesiac Machine Learning (Graves et al., 2020)
Toward Robustness and Privacy in Federated Learning: Experimenting with Local and Central Differential Privacy (Naseri et al., 2020)
Analyzing Information Leakage of Updates to Natural Language Models (Brockschmidt et al., 2020)
Estimating g-Leakage via Machine Learning (Romanelli et al., 2020)
Information Leakage in Embedding Models (Song and Raghunathan, 2020)
Hide-and-Seek Privacy Challenge (Jordan et al., 2020)
Synthetic Data -- Anonymisation Groundhog Day (Stadler et al., 2020) (code)
Robust Membership Encoding: Inference Attacks and CopyrightProtection for Deep Learning (Song and Shokri, 2020)
Quantifying Privacy Leakage in Graph Embedding (Duddu et al., 2020)
Quantifying and Mitigating Privacy Risks of Contrastive Learning (He and Zhang, 2021)
Coded Machine Unlearning (Aldaghri et al., 2020)
Unlearnable Examples: Making Personal Data Unexploitable (Huang et al., 2021)
Measuring Data Leakage in Machine-Learning Models with Fisher Information (Hannun et al., 2021)
Teacher Model Fingerprinting Attacks Against Transfer Learning (Chen et al, 2021)
Bounding Information Leakage in Machine Learning (Del Grosso et al., 2021)
RoFL: Attestable Robustness for Secure Federated Learning (Burkhalter et al., 2021)
Learning to Break Deep Perceptual Hashing: The Use Case NeuralHash (Struppek et al., 2021)
The Privacy Onion Effect: Memorization is Relative (Carlini et al., 2022)
Truth Serum: Poisoning Machine Learning Models to Reveal Their Secrets (Tramer et al., 2022)
LCANets++: Robust Audio Classification using Multi-layer Neural Networks with Lateral Competition (Dibbo et al., 2023)