Awesome

Awesome Adversarial Examples for Deep Learning

A list of amazing resources for adversarial examples in deep learning

Adversarial Examples for Machine Learning

The security of machine learning Barreno, Marco, et al. Machine Learning 81.2 (2010): 121-148.
Adversarial classification Dalvi, Nilesh, et al. Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2004.
Adversarial learning Lowd, Daniel, and Christopher Meek. Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining. ACM, 2005.
Multiple classifier systems for robust classifier design in adversarial environments Biggio, Battista, Giorgio Fumera, and Fabio Roli. International Journal of Machine Learning and Cybernetics 1.1-4 (2010): 27-41.
Evasion Attacks against Machine Learning at Test Time Biggio, Battista, et al. Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, Berlin, Heidelberg, 2013.
Can machine learning be secure? Barreno, Marco, et al. Proceedings of the 2006 ACM Symposium on Information, computer and communications security. ACM, 2006.
Towards the science of security and privacy in machine learning Papernot, Nicolas, et al. arXiv preprint arXiv:1611.03814 (2016).
Pattern recognition systems under attack Roli, Fabio, Battista Biggio, and Giorgio Fumera. Iberoamerican Congress on Pattern Recognition. Springer, Berlin, Heidelberg, 2013.

Approaches for Generating Adversarial Examples in Deep Learning

Intriguing properties of neural networks Szegedy, Christian, et al. arXiv preprint arXiv:1312.6199 (2013).
Explaining and harnessing adversarial examples Goodfellow, Ian J., Jonathon Shlens, and Christian Szegedy. arXiv preprint arXiv:1412.6572 (2014).
Deep neural networks are easily fooled: High confidence predictions for unrecognizable images Nguyen, Anh, Jason Yosinski, and Jeff Clune. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015.
Adversarial examples in the physical world Kurakin, Alexey, Ian Goodfellow, and Samy Bengio. arXiv preprint arXiv:1607.02533 (2016).
Adversarial diversity and hard positive generation Rozsa, Andras, Ethan M. Rudd, and Terrance E. Boult. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 2016.
The limitations of deep learning in adversarial settings Papernot, Nicolas, et al. Security and Privacy (EuroS&P), 2016 IEEE European Symposium on. IEEE, 2016.
Adversarial manipulation of deep representations Sabour, Sara, et al. ICLR. 2016.
Deepfool: a simple and accurate method to fool deep neural networks Moosavi-Dezfooli, Seyed-Mohsen, Alhussein Fawzi, and Pascal Frossard. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016.
Universal adversarial perturbations Moosavi-Dezfooli, Seyed-Mohsen, et al. IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2017.
Towards evaluating the robustness of neural networks Carlini, Nicholas, and David Wagner. Security and Privacy (S&P). 2017.
Machine Learning as an Adversarial Service: Learning Black-Box Adversarial Examples Hayes, Jamie, and George Danezis. arXiv preprint arXiv:1708.05207 (2017).
Zoo: Zeroth order optimization based black-box attacks to deep neural networks without training substitute models Chen, Pin-Yu, et al. 10th ACM Workshop on Artificial Intelligence and Security (AISEC) with the 24th ACM Conference on Computer and Communications Security (CCS). 2017.
Ground-Truth Adversarial Examples Carlini, Nicholas, et al. arXiv preprint arXiv:1709.10207. 2017.
Generating Natural Adversarial Examples Zhao, Zhengli, Dheeru Dua, and Sameer Singh. arXiv preprint arXiv:1710.11342. 2017.
Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples Anish Athalye, Nicholas Carlini, David Wagner. arXiv preprint arXiv:1802.00420. 2018.
Adversarial Attacks and Defences Competition Alexey Kurakin, Ian Goodfellow, Samy Bengio, Yinpeng Dong, Fangzhou Liao, Ming Liang, Tianyu Pang, Jun Zhu, Xiaolin Hu, Cihang Xie, Jianyu Wang, Zhishuai Zhang, Zhou Ren, Alan Yuille, Sangxia Huang, Yao Zhao, Yuzhe Zhao, Zhonglin Han, Junjiajia Long, Yerkebulan Berdibekov, Takuya Akiba, Seiya Tokui, Motoki Abe. arXiv preprint arXiv:1802.00420. 2018.

Defenses for Adversarial Examples

Network Ditillation

Distillation as a defense to adversarial perturbations against deep neural networks Papernot, Nicolas, et al.Security and Privacy (SP), 2016 IEEE Symposium on. IEEE, 2016.

Adversarial (Re)Training

Learning with a strong adversary Huang, Ruitong, et al. arXiv preprint arXiv:1511.03034 (2015).
Adversarial machine learning at scale Kurakin, Alexey, Ian Goodfellow, and Samy Bengio. ICLR. 2017.
Ensemble Adversarial Training: Attacks and Defenses Tramèr, Florian, et al. arXiv preprint arXiv:1705.07204 (2017).
Adversarial training for relation extraction Wu, Yi, David Bamman, and Stuart Russell. Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 2017.
Adversarial Logit Pairing Harini Kannan, Alexey Kurakin, Ian Goodfellow. arXiv preprint arXiv:1803.06373 (2018).

Adversarial Detecting

Detecting Adversarial Samples from Artifacts Feinman, Reuben, et al. arXiv preprint arXiv:1703.00410 (2017).
Adversarial and Clean Data Are Not Twins Gong, Zhitao, Wenlu Wang, and Wei-Shinn Ku. arXiv preprint arXiv:1704.04960 (2017).
Safetynet: Detecting and rejecting adversarial examples robustly Lu, Jiajun, Theerasit Issaranon, and David Forsyth. ICCV (2017).
On the (statistical) detection of adversarial examples Grosse, Kathrin, et al. arXiv preprint arXiv:1702.06280 (2017).
On detecting adversarial perturbations Metzen, Jan Hendrik, et al. ICLR Poster. 2017.
Early Methods for Detecting Adversarial Images Hendrycks, Dan, and Kevin Gimpel. ICLR Workshop (2017).
Dimensionality Reduction as a Defense against Evasion Attacks on Machine Learning Classifiers Bhagoji, Arjun Nitin, Daniel Cullina, and Prateek Mittal. arXiv preprint arXiv:1704.02654 (2017).
Detecting Adversarial Attacks on Neural Network Policies with Visual Foresight Lin, Yen-Chen, et al. arXiv preprint arXiv:1710.00814 (2017).
PixelDefend: Leveraging Generative Models to Understand and Defend against Adversarial Examples Song, Yang, et al. arXiv preprint arXiv:1710.10766 (2017).

Input Reconstruction

PixelDefend: Leveraging Generative Models to Understand and Defend against Adversarial Examples Song, Yang, et al. arXiv preprint arXiv:1710.10766 (2017).
MagNet: a Two-Pronged Defense against Adversarial Examples Meng, Dongyu, and Hao Chen. CCS (2017).
Towards deep neural network architectures robust to adversarial examples Gu, Shixiang, and Luca Rigazio. arXiv preprint arXiv:1412.5068 (2014).

Classifier Robustifying

Adversarial Examples, Uncertainty, and Transfer Testing Robustness in Gaussian Process Hybrid Deep Networks Bradshaw, John, Alexander G. de G. Matthews, and Zoubin Ghahramani.arXiv preprint arXiv:1707.02476 (2017).
Robustness to Adversarial Examples through an Ensemble of Specialists Abbasi, Mahdieh, and Christian Gagné. arXiv preprint arXiv:1702.06856 (2017).

Network Verification

Reluplex: An efficient SMT solver for verifying deep neural networks Katz, Guy, et al. CAV 2017.
Safety verification of deep neural networks Huang, Xiaowei, et al. International Conference on Computer Aided Verification. Springer, Cham, 2017.
Towards proving the adversarial robustness of deep neural networks Katz, Guy, et al. arXiv preprint arXiv:1709.02802 (2017).
Deepsafe: A data-driven approach for checking adversarial robustness in neural networks Gopinath, Divya, et al. arXiv preprint arXiv:1710.00486 (2017).
DeepXplore: Automated Whitebox Testing of Deep Learning Systems Pei, Kexin, et al. arXiv preprint arXiv:1705.06640 (2017).

Others

Adversarial Example Defenses: Ensembles of Weak Defenses are not Strong He, Warren, et al. 11th USENIX Workshop on Offensive Technologies (WOOT 17). (2017).
Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods Carlini, Nicholas, and David Wagner. AISec. 2017.

Applications for Adversarial Examples

Reinforcement Learning

Adversarial attacks on neural network policies Huang, Sandy, et al. arXiv preprint arXiv:1702.02284 (2017).
Delving into adversarial attacks on deep policies Kos, Jernej, and Dawn Song. ICLR Workshop. 2017.

Generative Modelling

Adversarial examples for generative models Kos, Jernej, Ian Fischer, and Dawn Song. arXiv preprint arXiv:1702.06832 (2017).
Adversarial images for variational autoencoders Tabacof, Pedro, Julia Tavares, and Eduardo Valle. Workshop on Adversarial Training, NIPS. 2016

Semantic Segmentation

Accessorize to a crime: Real and stealthy attacks on state-of-the-art face recognition Sharif, Mahmood, et al. Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. ACM, 2016.
Adversarial Examples for Semantic Segmentation and Object Detection Xie, Cihang, et al. arXiv preprint arXiv:1703.08603 (2017).
Adversarial Examples for Semantic Image Segmentation Fischer, Volker, et al. ICLR workshop. 2017.
Universal Adversarial Perturbations Against Semantic Image Segmentation Hendrik Metzen, Jan, et al. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017.
Semantic Image Synthesis via Adversarial Learning Dong, Hao, et al. ICCV(2017).

Object Detection

Adversarial Examples for Semantic Segmentation and Object Detection Xie, Cihang, et al. arXiv preprint arXiv:1703.08603 (2017).
Physical Adversarial Examples for Object Detectors Kevin Eykholt, Ivan Evtimov, Earlence Fernandes, Bo Li, Amir Rahmati, Florian Tramer, Atul Prakash, Tadayoshi Kohno, Dawn Song. arXiv preprint arXiv:1807.07769 (2018).

Scene Text Recognition

Adaptive Adversarial Attack on Scene Text Recognition Xiaoyong Yuan, Pan He, Xiaolin Andy Li. arXiv preprint arXiv:1807.03326 (2018).

Reading Comprehension

Adversarial examples for evaluating reading comprehension systems Jia, Robin, and Percy Liang. EMNLP. 2017.
Understanding Neural Networks through Representation Erasure Li, Jiwei, Will Monroe, and Dan Jurafsky. arXiv preprint arXiv:1612.08220 (2016).

Malware Detection

Adversarial examples for malware detection Grosse, Kathrin, et al. European Symposium on Research in Computer Security. Springer, Cham, 2017.
Generating Adversarial Malware Examples for Black-Box Attacks Based on GAN Hu, Weiwei, and Ying Tan. arXiv preprint arXiv:1702.05983 (2017).
Evading Machine Learning Malware Detection Anderson, Hyrum S., et al. Black Hat. 2017.
DeepDGA: Adversarially-Tuned Domain Generation and Detection Anderson, Hyrum S., Jonathan Woodbridge, and Bobby Filar. Proceedings of the 2016 ACM Workshop on Artificial Intelligence and Security. ACM, 2016.
Automatically evading classifiers Xu, Weilin, Yanjun Qi, and David Evans. Proceedings of the 2016 Network and Distributed Systems Symposium. 2016.

Speech Recognition

Targeted Adversarial Examples for Black Box Audio Systems Rohan Taori, Amog Kamsetty, Brenton Chu, Nikita Vemuri. arXiv preprint arXiv:1805.07820 (2018).
CommanderSong: A Systematic Approach for Practical Adversarial Voice Recognition Xuejing Yuan, Yuxuan Chen, Yue Zhao, Yunhui Long, Xiaokang Liu, Kai Chen, Shengzhi Zhang, Heqing Huang, Xiaofeng Wang, Carl A. Gunter. USENIX Security. 2018.
Audio Adversarial Examples: Targeted Attacks on Speech-to-Text Nicholas Carlini, David Wagner. Deep Learning and Security Workshop, 2018.

Transferability of Adversarial Examples

Transferability in machine learning: from phenomena to black-box attacks using adversarial samples Papernot, Nicolas, Patrick McDaniel, and Ian Goodfellow. arXiv preprint arXiv:1605.07277 (2016).
Delving into transferable adversarial examples and black-box attacks Liu, Yanpei, et al. ICLR 2017.

Analysis of Adversarial Examples

Fundamental limits on adversarial robustness Fawzi, Alhussein, Omar Fawzi, and Pascal Frossard. Proc. ICML, Workshop on Deep Learning. 2015.
Exploring the space of adversarial images Tabacof, Pedro, and Eduardo Valle. Neural Networks (IJCNN), 2016 International Joint Conference on. IEEE, 2016.
A boundary tilting perspective on the phenomenon of adversarial examples Tanay, Thomas, and Lewis Griffin. arXiv preprint arXiv:1608.07690 (2016).
Measuring neural net robustness with constraints Bastani, Osbert, et al. Advances in Neural Information Processing Systems. 2016.
Towards Interpretable Deep Neural Networks by Leveraging Adversarial Examples Yinpeng Dong, Hang Su, Jun Zhu, Fan Bao. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
Adversarially Robust Generalization Requires More Data Ludwig Schmidt, Shibani Santurkar, Dimitris Tsipras, Kunal Talwar, Aleksander Mądry. arXiv preprint arXiv:1804.11285. 2018.
A Boundary Tilting Persepective on the Phenomenon of Adversarial Examples Thomas Tanay, Lewis Griffin. arXiv preprint arXiv:1608.07690. 2018.
Adversarial vulnerability for any classifier Alhussein Fawzi, Hamza Fawzi, Omar Fawzi. arXiv preprint arXiv:1802.08686. 2018.
Adversarial Spheres Justin Gilmer, Luke Metz, Fartash Faghri, Samuel S. Schoenholz, Maithra Raghu, Martin Wattenberg, Ian Goodfellow. ICLR. 2018.

Tools

cleverhans v2.0.0: an adversarial machine learning library Papernot, Nicolas, et al. arXiv preprint arXiv:1610.00768 (2017).
Foolbox: A Python toolbox to benchmark the robustness of machine learning models Jonas Rauber, Wieland Brendel, Matthias Bethge. arXiv preprint arXiv:1707.04131 (2017). Documentation Code
advertorch v0.1: An Adversarial Robustness Toolbox based on PyTorch Gavin Weiguang Ding, Luyu Wang, Xiaomeng Jin arXiv:1902.07623 (2019) github repo

Cite this work

If you find this list useful for academic research, we would appreciate citations:

@article{yuan2017adversarial,
  title={Adversarial Examples: Attacks and Defenses for Deep Learning},
  author={Xiaoyong Yuan, Pan He, Qile Zhu, Xiaolin Li},
  journal={arXiv preprint arXiv:1712.07107},
  year={2017}
}

We will update recent studies in the list