Awesome

Advanced Deep Learning @ KAIST

Course Information

Instructor: Sung Ju Hwang (sjhwang82@kaist.ac.kr)
TAs: Seul Lee (animecult@kaist.ac.kr) and Jaehyeong Jo (harryjo97@kaist.ac.kr)

Office: This is an on/offline hybrid course. Building Nubmer 9, Room 9201 (Instructor) 2nd floor (TAs)
Office hours: By appointment only.

Grading Policy

Absolute Grading
Paper Presentation: 20%
Attendance and Participation: 20%
Project: 60%

Tentative Schedule

Dates	Topic
8/29	Course Introduction
9/1	Review of Deep Learning Basics (Video Lecture)
9/6	Vision Transformers (Lecture)
9/8	Vision Transformers / Self-Supervised Learning (Lecture)
9/13	Self-Supervised Learning (Lecture)
9/15	Self-Supervised Learning (Presentation)
9/20	Bayesian Deep Learning - Bayesian ML Basics, Bayesian Neural Networks (Lecture)
9/22	Bayesian Deep Learning - Bayesian Approximations, Uncertainties in Prediction (Lecture)
9/27	Bayesian Deep Learning - MCMC Sampling for Bayesian Inference, Neural Processes (Lecture)
9/29	Bayesian Deep Learning (Presentation)
10/4	Deep Generative Models - Advanced GANs (Lecture)
10/6	Deep Generative Models - Advanced GANs (Presentation) Initial Proposal Due
10/11	Deep Generative Models - Diffusion Models (Lecture)
10/13	Deep Generative Models - Diffusion Models (Lecture)
10/18	Deep Generative Models - Diffusion Models (Presentation)
10/20	Mid-term Presentation
10/25	Large Language Models (Lecture)
10/27	Multimodal Generative Models (Lecture)
11/1	Large Language Models and Multimodal Generative Models (Presentation)
11/3	Deep Reinforcement Learning - Deep RL Basics (Lecture)
11/8	Deep Reinforcement Learning - Policy-based RL, Model-based RL (Lecture)
11/10	Deep Reinforcement Learning - Offline RL, Exploration, RL via Sequence Modeling (Lecture)
11/15	Deep Reinforcement Learning (Presentation)
11/17	Meta Learning (Lecture)
11/22	Meta Learning (Presentation)
11/24	Continual Learning (Lecture)
11/29	Continual Learning (Presentation)
12/1	Robust Deep Learning (Lecture)
12/6	Robust Deep Learning (Presentation)
12/8	Deep Graph Learning (Lecture)
12/13	Deep Graph Learning (Presentation)
12/15	Final Presentation

Reading List

Vision Transformers

[Dosovitskiy et al. 21] An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale, ICLR 2021.
[Touvron et al. 21] Training Data-efficient Image transformers & Distillation through Attention, ICML 2021.
[Liu et al. 21] Swin Transformer: Hierarchical Vision Transformer using Shifted Windows, ICCV 2021.
[Wu et al. 21] CvT: Introducing Convolutions to Vision Transformers, ICCV 2021.
[Dai et al. 21] CoAtNet: Marrying Convolution and Attnetion for All Data Sizes, NeurIPS 2021.
[Yang et al. 21] Focal Attention for Long-Range Interactions in Vision Transformers, NeurIPS 2021.
[El-Nouby et al. 21] XCiT: Cross-Covariance Image Transformers, NeurIPS 2021.
[Li et al. 22] MViTv2: Improved Multiscale Vision Transformers for Classification and Detection, CVPR 2022.
[Lee et al. 22] MPViT : Multi-Path Vision Transformer for Dense Prediction, CVPR 2022.
[Liu et al. 22]A ConvNet for the 2020s, CVPR 2022.

Self-Supervised Learning

[Dosovitskiy et al. 14] Discriminative Unsupervised Feature Learning with Convolutional Neural Networks, NIPS 2014.
[Pathak et al. 16] Context Encoders: Feature Learning by Inpainting, CVPR 2016.
[Norrozi and Favaro et al. 16] Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles, ECCV 2016.
[Gidaris et al. 18] Unsupervised Representation Learning by Predicting Image Rotations, ICLR 2018.
[He et al. 20] Momentum Contrast for Unsupervised Visual Representation Learning, CVPR 2020.
[Chen et al. 20] A Simple Framework for Contrastive Learning of Visual Representations, ICML 2020.
[Mikolov et al. 13] Efficient Estimation of Word Representations in Vector Space, ICLR 2013.
[Devlin et al. 19] BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, NAACL 2019.
[Clark et al. 20] ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators, ICLR 2020.
[Hu et al. 20] Strategies for Pre-training Graph Neural Networks, ICLR 2020.
[Chen et al. 20] Generative Pretraining from Pixels, ICML 2020.
[Laskin et al. 20] CURL: Contrastive Unsupervised Representations for Reinforcement Learning, ICML 2020.
[Grill et al. 20] Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning, NeurIPS 2020.
[Chen et al. 20] Big Self-Supervised Models are Strong Semi-Supervised Learners, NeurIPS, 2020.
[Chen and He. 21] Exploring Simple Siamese Representation Learning, CVPR 2021.
[Tian et al. 21] Understanding Self-Supervised Learning Dynamics without Contrastive Pairs, ICML 2021.
[Caron et al. 21] Emerging Properties in Self-Supervised Vision Transformers, ICCV 2021.

[Liu et al. 22] Self-supervised Learning is More Robust to Dataset Imbalance, ICLR 2022.
[Bao et al. 22] BEiT: BERT Pre-Training of Image Transformers, ICLR 2022.
[He et al. 22] Masked Autoencoders are Scalable Vision Learners, CVPR 2022.
[Liu et al. 22] Improving Contrastive Learning with Model Augmetnation, arXiv preprint, 2022.
[Touvron et al. 22] DeIT III: Revenge of the VIT, arXiv preprint, 2022.

Bayesian Deep Learning

[Kingma and Welling 14] Auto-Encoding Variational Bayes, ICLR 2014.
[Kingma et al. 15] Variational Dropout and the Local Reparameterization Trick, NIPS 2015.
[Blundell et al. 15] Weight Uncertainty in Neural Networks, ICML 2015.
[Gal and Ghahramani 16] Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning, ICML 2016.
[Liu et al. 16] Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm, NIPS 2016.
[Mandt et al. 17] Stochastic Gradient Descent as Approximate Bayesian Inference, JMLR 2017.
[Kendal and Gal 17] What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision?, ICML 2017.
[Gal et al. 17] Concrete Dropout, NIPS 2017.
[Gal et al. 17] Deep Bayesian Active Learning with Image Data, ICML 2017.
[Teye et al. 18] Bayesian Uncertainty Estimation for Batch Normalized Deep Networks, ICML 2018.
[Garnelo et al. 18] Conditional Neural Process, ICML 2018.
[Kim et al. 19] Attentive Neural Processes, ICLR 2019.
[Sun et al. 19] Functional Variational Bayesian Neural Networks, ICLR 2019.
[Louizos et al. 19] The Functional Neural Process, NeurIPS 2019.
[Zhang et al. 20] Cyclical Stochastic Gradient MCMC for Bayesian Deep Learning, ICLR 2020.
[Amersfoort et al. 20] Uncertainty Estimation Using a Single Deep Deterministic Neural Network, ICML 2020.
[Dusenberry et al. 20] Efficient and Scalable Bayesian Neural Nets with Rank-1 Factors, ICML 2020.
[Wenzel et al. 20] How Good is the Bayes Posterior in Deep Neural Networks Really?, ICML 2020.
[Lee et al. 20] Bootstrapping Neural Processes, NeurIPS 2020.
[Wilson et al. 20] Bayesian Deep Learning and a Probabilistic Perspective of Generalization, NeurIPS 2020.
[Izmailov et al. 21] What Are Bayesian Neural Network Posteriors Really Like?, ICML 2021.
[Daxberger et al. 21] Bayesian Deep Learning via Subnetwork Inference, ICML 2021.

[Fortuin et al. 22] Bayesian Neural Network Priors Revisited, ICLR 2022.
[Muller et al. 22] Transformers Can Do Bayesian Inference, ICLR 2022.
[Nguyen and Grover 22] Transformer Neural Processes, ICML 2022.
[Nazaret and Blei 22] Variational Inference for Infinitely Deep Neural Networks, ICML 2022.
[Lotfi et al. 22] Bayesian Model Selection, the Marginal Likelihood, and Generalization, ICML 2022.
[Alexos et al. 22] Structured Stochastic Gradient MCMC, ICML 2022.

Deep Generative Models

VAEs, Autoregressive and Flow-Based Generative Models

[Rezende and Mohamed 15] Variational Inference with Normalizing Flows, ICML 2015.
[Germain et al. 15] MADE: Masked Autoencoder for Distribution Estimation, ICML 2015.
[Kingma et al. 16] Improved Variational Inference with Inverse Autoregressive Flow, NIPS 2016.
[Oord et al. 16] Pixel Recurrent Neural Networks, ICML 2016.
[Dinh et al. 17] Density Estimation Using Real NVP, ICLR 2017.
[Papamakarios et al. 17] Masked Autoregressive Flow for Density Estimation, NIPS 2017.
[Huang et al.18] Neural Autoregressive Flows, ICML 2018.
[Kingma and Dhariwal 18] Glow: Generative Flow with Invertible 1x1 Convolutions, NeurIPS 2018.
[Ho et al. 19] Flow++: Improving Flow-Based Generative Models with Variational Dequantization and Architecture Design, ICML 2019.
[Chen et al. 19] Residual Flows for Invertible Generative Modeling, NeurIPS 2019.
[Tran et al. 19] Discrete Flows: Invertible Generative Models of Discrete Data, NeurIPS 2019.
[Ping et al. 20] WaveFlow: A Compact Flow-based Model for Raw Audio, ICML 2020.
[Vahdat and Kautz 20] NVAE: A Deep Hierarchical Variational Autoencoder, NeurIPS 2020.
[Ho et al. 20] Denoising Diffusion Probabilistic Models, NeurIPS 2020.
[Song et al. 21] Score-Based Generative Modeling through Stochastic Differential Equations, ICLR 2021.
[Kosiorek et al. 21] NeRF-VAE: A Geometry Aware 3D Scene Generative Model, ICML 2021.

Generative Adversarial Networks

[Goodfellow et al. 14] Generative Adversarial Nets, NIPS 2014.
[Radford et al. 15] Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks, ICLR 2016.
[Chen et al. 16] InfoGAN: Interpreting Representation Learning by Information Maximizing Generative Adversarial Nets, NIPS 2016.
[Arjovsky et al. 17] Wasserstein Generative Adversarial Networks, ICML 2017.
[Zhu et al. 17] Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks, ICCV 2017.
[Zhang et al. 17] Adversarial Feature Matching for Text Generation, ICML 2017.
[Karras et al. 18] Progressive Growing of GANs for Improved Quality, Stability, and Variation, ICLR 2018.
[Choi et al. 18] StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation, CVPR 2018.
[Brock et al. 19] Large Scale GAN Training for High-Fidelity Natural Image Synthesis, ICLR 2019.
[Karras et al. 19] A Style-Based Generator Architecture for Generative Adversarial Networks, CVPR 2019.
[Karras et al. 20] Analyzing and Improving the Image Quality of StyleGAN, CVPR 2020.
[Sinha et al. 20] Small-GAN: Speeding up GAN Training using Core-Sets, ICML 2020.
[Karras et al. 20] Training Generative Adversarial Networks with Limited Data, NeurIPS 2020.
[Liu et al. 21] Towards Faster and Stabilized GAN Training for High-fidelity Few-shot Image Synthesis, ICLR 2021.
[Esser et al. 22] Taming Transformers for High-Resolution Image Synthesis, CVPR 2021.
[Hudson and Zitnick 21] Generative Adversarial Transformers, ICML 2021.
[Karras et al. 21] Alias-Free Generative Adversarial Networks, NeurIPS 2021.

[Skorokhodov et al. 22] StyleGAN-V: A Continuous Video Generator with the Price, Image Quality and Perks of StyleGAN2, CVPR 2022.
[Lin et al. 22] InfinityGAN: Towards Infinite-Pixel Image Synthesis, ICLR 2022.
[Lee et al. 22] ViTGAN: Training GANs with Vision Transformers, ICLR 2022.
[Yu et al. 22] Vector-Quantized Image Modeling with Improved VQGAN, ICLR 2022.
[Franceschi et al. 22] A Neural Tangent Kernel Perspective of GANs, ICML 2022.

Diffusion Models

[Song and Ermon 19] Generative Modeling by Estimating Gradients of the Data Distribution, NeurIPS 2019.
[Song and Ermon 20] Improved Techniques for Training Score-Based Generative Models, NeurIPS 2020.
[Ho et al. 20] Denoising Diffusion Probabilistic Models, NeurIPS 2020.
[Song et al. 21] Score-Based Generative Modeling through Stochastic Differential Equations, ICLR 2021.
[Nichol and Dhariwal 21] Improved Denoising Diffusion Probabilistic Models, ICML 2021.
[Vahdat et al. 21] Score-based Generative Modeling in Latent Space, NeurIPS 2021.
[Dhariwal and Nichol 21] Diffusion Models Beat GANs on Image Synthesis, NeureIPS 2021.
[De Bortoli et al. 22] Diffusion Schrodinger Bridge with Application to Score-Based Generative Modeling, NeurIPS 2021.
[Ho and Salimans 22] Classifier-Free Diffusion Guidance, arXiv preprint, 2022.

[Dockhorn et al. 22] Score-Based Generative Modeling with Critically-Damped Langevin Diffusion, ICLR 2022.
[Salimans and Ho 22] Progressive Distillation for Fast Sampling of Diffusion Models, ICLR 2022.
[Chen et al. 22] Likelihood Training of Schrodinger Bridge using Forward-Backwrad SDEs Theory, ICLR 2022.

Deep Reinforcement Learning

[Mnih et al. 13] Playing Atari with Deep Reinforcement Learning, NIPS Deep Learning Workshop 2013.
[Silver et al. 14] Deterministic Policy Gradient Algorithms, ICML 2014.
[Schulman et al. 15] Trust Region Policy Optimization, ICML 2015.
[Lillicrap et al. 16] Continuous Control with Deep Reinforcement Learning, ICLR 2016.
[Schaul et al. 16] Prioritized Experience Replay, ICLR 2016.
[Wang et al. 16] Dueling Network Architectures for Deep Reinforcement Learning, ICML 2016.
[Mnih et al. 16] Asynchronous Methods for Deep Reinforcement Learning, ICML 2016.
[Schulman et al. 17] Proximal Policy Optimization Algorithms, arXiv preprint, 2017.
[Nachum et al. 18] Data-Efficient Hierarchical Reinforcement Learning, NeurIPS 2018.
[Ha et al. 18] Recurrent World Models Facilitate Policy Evolution, NeurIPS 2018.
[Burda et al. 19] Large-Scale Study of Curiosity-Driven Learning, ICLR 2019.
[Vinyals et al. 19] Grandmaster level in StarCraft II using multi-agent reinforcement learning, Nature, 2019.
[Bellemare et al. 19] A Geometric Perspective on Optimal Representations for Reinforcement Learning, NeurIPS 2019.
[Janner et al. 19] When to Trust Your Model: Model-Based Policy Optimization, NeurIPS 2019.
[Fellows et al. 19] VIREL: A Variational Inference Framework for Reinforcement Learning, NeurIPS 2019.
[Kumar et al. 19] Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction, NeurIPS 2019.
[Kaiser et al. 20] Model Based Reinforcement Learning for Atari, ICLR 2020.
[Agarwal et al. 20] An Optimistic Perspective on Offline Reinforcement Learning, ICML 2020.
[Lee et al. 20] Batch Reinforcement Learning with Hyperparameter Gradients, ICML 2020.
[Kumar et al. 20] Conservative Q-Learning for Offline Reinforcement Learning, ICML 2020.
[Yarats et al. 21] Image Augmentation Is All You Need: Regularizing Deep Reinforcement Learning from Pixels, ICLR 2021.
[Chen et al. 21] Decision Transformer: Reinforcement Learning via Sequence Modeling, NeurIPS 2021.

[Mai et al. 22] Sample Efficient Deep Reinforcement Learning via Uncertainty Estimation, ICLR 2022.
[Furuta et al. 22] Generalized Decision Transformer for Offline Hindsight Information Matching, ICLR 2022.
[Oh et al. 22] Model-augmented Prioritized Experience Replay, ICLR 2022.
[Rengarajan et al. 22] Reinforcement Learning with Sparse Rewards Using Guidance from Offline Demonstration, ICLR 2022.
[Patil et al. 22] Align-RUDDER: Learning from Few Demonstrations by Reward Redistribution, ICML 2022.
[Goyal et al. 22] Retrieval Augmented Reinforcement Learning, ICML 2022.
[Reed et al. 22] A Generalist Agent, arXiv preprint, 2022.

Memory and Computation-Efficient Deep Learning

[Han et al. 15] Learning both Weights and Connections for Efficient Neural Networks, NIPS 2015.
[Wen et al. 16] Learning Structured Sparsity in Deep Neural Networks, NIPS 2016
[Han et al. 16] Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding, ICLR 2016
[Molchanov et al. 17] Variational Dropout Sparsifies Deep Neural Networks, ICML 2017
[Luizos et al. 17] Bayesian Compression for Deep Learning, NIPS 2017.
[Luizos et al. 18] Learning Sparse Neural Networks Through L0 Regularization, ICLR 2018.
[Howard et al. 18] MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications, CVPR 2018.
[Frankle and Carbin 19] The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks, ICLR 2019.
[Lee et al. 19] SNIP: Single-Shot Network Pruning Based On Connection Sensitivity, ICLR 2019.
[Liu et al. 19] Rethinking the Value of Network Pruning, ICLR 2019.
[Jung et al. 19] Learning to Quantize Deep Networks by Optimizing Quantization Intervals with Task Loss, CVPR 2019.
[Morcos et al. 19] One ticket to win them all: generalizing lottery ticket initializations across datasets and optimizers, NeurIPS 2019.
[Renda et al. 20] Comparing Rewinding and Fine-tuning in Neural Network Pruning, ICLR 2020.
[Frankle et al. 20] Linear Mode Connectivity and the Lottery Ticket Hypothesis, ICML 2020.
[Tanaka et al. 20] Pruning Neural Networks without Any Data by Iteratively Conserving Synaptic Flow, NeurIPS 2020.
[van Baalen et al. 20] Bayesian Bits: Unifying Quantization and Pruning, NeurIPS 2020.
[de Jorge et al. 21] Progressive Skeletonization: Trimming more fat from a network at initialization, ICLR 2021.
[Stock et al. 21] Training with Quantization Noise for Extreme Model Compression, ICLR 2021.
[Lee et al. 21] Semi-Relaxed Quantization with DropBits: Training Low-Bit Neural Networks via Bit-wise Regularization, ICCV 2021.

Meta Learning

[Santoro et al. 16] Meta-Learning with Memory-Augmented Neural Networks, ICML 2016
[Vinyals et al. 16] Matching Networks for One Shot Learning, NIPS 2016
[Edwards and Storkey 17] Towards a Neural Statistician, ICLR 2017
[Finn et al. 17] Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks, ICML 2017
[Snell et al. 17] Prototypical Networks for Few-shot Learning, NIPS 2017.
[Nichol et al. 18] On First-Order Meta-learning Algorithms, arXiv preprint, 2018.
[Lee and Choi 18] Gradient-Based Meta-Learning with Learned Layerwise Metric and Subspace, ICML 2018.
[Liu et al. 19] Learning to Propagate Labels: Transductive Propagation Network for Few-shot Learning, ICLR 2019.
[Gordon et al. 19] Meta-Learning Probabilistic Inference for Prediction, ICLR 2019.
[Ravi and Beatson 19] Amortized Bayesian Meta-Learning, ICLR 2019.
[Rakelly et al. 19] Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables, ICML 2019.
[Shu et al. 19] Meta-Weight-Net: Learning an Explicit Mapping For Sample Weighting, NeurIPS 2019.
[Finn et al. 19] Online Meta-Learning, ICML 2019.
[Lee et al. 20] Learning to Balance: Bayesian Meta-Learning for Imbalanced and Out-of-distribution Tasks, ICLR 2020.
[Yin et al. 20] Meta-Learning without Memorization, ICLR 2020.
[Raghu et al. 20] Rapid Learning or Feature Reuse? Towards Understanding the Effectiveness of MAML, ICLR 2020.
[Iakovleva et al. 20] Meta-Learning with Shared Amortized Variational Inference, ICML 2020.
[Bronskill et al. 20] TaskNorm: Rethinking Batch Normalization for Meta-Learning, ICML 2020.
[Rajendran et al. 20] Meta-Learning Requires Meta-Augmentation, NeurIPS 2020.
[Lee et al. 21] Meta-GMVAE: Mixture of Gaussian VAE for Unsupervised Meta-Learning, ICLR 2021.
[Shin et al. 21] Large-Scale Meta-Learning with Continual Trajectory Shifting, ICML 2021.
[Acar et al. 21] Memory Efficient Online Meta Learning, ICML 2021.

[Lee et al. 22] Online Hyperparameter Meta-Learning with Hypergradient Distillation, ICLR 2022.
[Flennerhag et al. 22] Boostrapped Meta-Learning, ICLR 2022.
[Yao et al. 22] Meta-Learning with Fewer Tasks through Task Interpolation, ICLR 2022.
[Guan and Lu 22] Task Relatedness-Based Generalization Bounds for Meta Learning, ICLR 2022.

Continual Learning

[Rusu et al. 16] Progressive Neural Networks, arXiv preprint, 2016
[Kirkpatrick et al. 17] Overcoming catastrophic forgetting in neural networks, PNAS 2017
[Lee et al. 17] Overcoming Catastrophic Forgetting by Incremental Moment Matching, NIPS 2017
[Shin et al. 17] Continual Learning with Deep Generative Replay, NIPS 2017.
[Lopez-Paz and Ranzato 17] Gradient Episodic Memory for Continual Learning, NIPS 2017.
[Yoon et al. 18] Lifelong Learning with Dynamically Expandable Networks, ICLR 2018.
[Nguyen et al. 18] Variational Continual Learning, ICLR 2018.
[Schwarz et al. 18] Progress & Compress: A Scalable Framework for Continual Learning, ICML 2018.
[Chaudhry et al. 19] Efficient Lifelong Learning with A-GEM, ICLR 2019.
[Rao et al. 19] Continual Unsupervised Representation Learning, NeurIPS 2019.
[Rolnick et al. 19] Experience Replay for Continual Learning, NeurIPS 2019.
[Jerfel et al. 20] Reconciling Meta-Learning and Continual Learning with Online Mixtures of Tasks, NeurIPS 2019.
[Yoon et al. 20] Scalable and Order-robust Continual Learning with Additive Parameter Decomposition, ICLR 2020.
[Remasesh et al. 20] Anatomy of Catastrophic Forgetting: Hidden Representations and Task Semantics, Continual Learning Workshop, ICML 2020.
[Borsos et al. 20] Coresets via Bilevel Optimization for Continual Learning and Streaming, NeurIPS 2020.
[Mirzadeh et al. 20] Understanding the Role of Training Regimes in Continual Learning, NeurIPS 2020.
[Saha et al. 21] Gradient Projection Memory for Continual Learning, ICLR 2021.
[Veinat et al. 21] Efficient Continual Learning with Modular Networks and Task-Driven Priors, ICLR 2021.

[Madaan et al. 22] Representational Continuity for Unsupervised Continual Learning, ICLR 2022.
[Yoon et al. 22] Online Coreset Selection for Rehearsal-based Continual Learning, ICLR 2022.
[Lin et al. 22] TRGP: Trust Region Gradient Projection for Continual Learning, ICLR 2022.
[Wang et al. 22] Improving Task-free Continual Learning by Distributionally Robust Memory Evolution, ICML 2022.
[Kang et al. 22] Forget-free Continual Learning with Winning Subnetworks, ICML 2022.

Interpretable Deep Learning

[Ribeiro et al. 16] "Why Should I Trust You?" Explaining the Predictions of Any Classifier, KDD 2016
[Kim et al. 16] Examples are not Enough, Learn to Criticize! Criticism for Interpretability, NIPS 2016
[Choi et al. 16] RETAIN: An Interpretable Predictive Model for Healthcare using Reverse Time Attention Mechanism, NIPS 2016
[Koh et al. 17] Understanding Black-box Predictions via Influence Functions, ICML 2017
[Bau et al. 17] Network Dissection: Quantifying Interpretability of Deep Visual Representations, CVPR 2017
[Selvaraju et al. 17] Grad-CAM: Visual Explanation from Deep Networks via Gradient-based Localization, ICCV 2017.
[Kim et al. 18] Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV), ICML 2018.
[Heo et al. 18] Uncertainty-Aware Attention for Reliable Interpretation and Prediction, NeurIPS 2018.
[Bau et al. 19] GAN Dissection: Visualizing and Understanding Generative Adversarial Networks, ICLR 2019.
[Ghorbani et al. 19] Towards Automatic Concept-based Explanations, NeurIPS 2019.
[Coenen et al. 19] Visualizing and Measuring the Geometry of BERT, NeurIPS 2019.
[Heo et al. 20] Cost-Effective Interactive Attention Learning with Neural Attention Processes, ICML 2020.
[Agarwal et al. 20] Neural Additive Models: Interpretable Machine Learning with Neural Nets, arXiv preprint, 2020.

Reliable Deep Learning

[Guo et al. 17] On Calibration of Modern Neural Networks, ICML 2017.
[Lakshminarayanan et al. 17] Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles, NIPS 2017.
[Liang et al. 18] Enhancing the Reliability of Out-of-distrubition Image Detection in Neural Networks, ICLR 2018.
[Lee et al. 18] Training Confidence-calibrated Classifiers for Detecting Out-of-Distribution Samples, ICLR 2018.
[Kuleshov et al. 18] Accurate Uncertainties for Deep Learning Using Calibrated Regression, ICML 2018.
[Jiang et al. 18] To Trust Or Not To Trust A Classifier, NeurIPS 2018.
[Madras et al. 18] Predict Responsibly: Improving Fairness and Accuracy by Learning to Defer, NeurIPS 2018.
[Maddox et al. 19] A Simple Baseline for Bayesian Uncertainty in Deep Learning, NeurIPS 2019.
[Kull et al. 19] Beyond temperature scaling: Obtaining well-calibrated multiclass probabilities with Dirichlet calibration, NeurIPS 2019.
[Thulasidasan et al. 19] On Mixup Training: Improved Calibration and Predictive Uncertainty for Deep Neural Networks, NeurIPS 2019.
[Ovadia et al. 19] Can You Trust Your Model’s Uncertainty? Evaluating Predictive Uncertainty Under Dataset Shift, NeurIPS 2019.
[Hendrycks et al. 20] AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty, ICLR 2020.
[Filos et al. 20] Can Autonomous Vehicles Identify, Recover From, and Adapt to Distribution Shifts?, ICML 2020.

Robust Deep Learning

[Szegedy et al. 14] Intriguing Properties of Neural Networks, ICLR 2014.
[Goodfellow et al. 15] Explaining and Harnessing Adversarial Examples, ICLR 2015.
[Kurakin et al. 17] Adversarial Machine Learning at Scale, ICLR 2017.
[Madry et al. 18] Toward Deep Learning Models Resistant to Adversarial Attacks, ICLR 2018.
[Eykholt et al. 18] Robust Physical-World Attacks on Deep Learning Visual Classification.
[Athalye et al. 18] Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples, ICML 2018.
[Zhang et al. 19] Theoretically Principled Trade-off between Robustness and Accuracy, ICML 2019.
[Carmon et al. 19] Unlabeled Data Improves Adversarial Robustness, NeurIPS 2019.
[Ilyas et al. 19] Adversarial Examples are not Bugs, They Are Features, NeurIPS 2019.
[Li et al. 19] Certified Adversarial Robustness with Additive Noise, NeurIPS 2019.
[Tramèr and Boneh 19] Adversarial Training and Robustness for Multiple Perturbations, NeurIPS 2019.
[Shafahi et al. 19] Adversarial Training for Free!, NeurIPS 2019.
[Wong et al. 20] Fast is Better Than Free: Revisiting Adversarial Training, ICLR 2020.
[Madaan et al. 20] Adversarial Neural Pruning with Latent Vulnerability Suppression, ICML 2020.
[Croce and Hein 20] Reliable Evaluation of Adversarial Robustness with an Ensemble of Diverse Parameter-free Attacks, ICML 2020.
[Maini et al. 20] Adversarial Robustness Against the Union of Multiple Perturbation Models, ICML 2020.
[Kim et al. 20] Adversarial Self-Supervised Contrastive Learning, NeurIPS 2020.
[Wu et al. 20] Adversarial Weight Perturbation Helps Robust Generalization, NeurIPS 2020.
[Laidlaw et al. 21] Perceptual Adversarial Robustness: Defense Against Unseen Threat Models, ICLR 2021.
[Pang et al. 21] Bag of Tricks for Adversarial Training, ICLR 2021.
[Madaan et al. 21] Learning to Generate Noise for Multi-Attack Robustness, ICML 2021.

[Mladenovic et al. 22] Online Adversarial Attacks, ICLR 2022.
[Zhang et al. 22] How to Robustify Black-Box ML Models? A Zeroth-Order Optimization Perspective, ICLR 2022.
[Carlini and Terzis 22] Poisoning and Backdooring Contrastive Learning, ICLR 2022.
[Croce et al. 22] Evaluating the Adversarial Robustness of Adaptive Test-time Defenses, ICML 2022.
[Zhou et al. 22] Understanding the Robustness in Vision Transformers, ICML 2022.

Graph Neural Networks

[Li et al. 16] Gated Graph Sequence Neural Networks, ICLR 2016.
[Hamilton et al. 17] Inductive Representation Learning on Large Graphs, NIPS 2017.
[Kipf and Welling 17] Semi-Supervised Classification with Graph Convolutional Networks, ICLR 2017.
[Velickovic et al. 18] Graph Attention Networks, ICLR 2018.
[Ying et al. 18] Hierarchical Graph Representation Learning with Differentiable Pooling, NeurIPS 2018.
[Xu et al. 19] How Powerful are Graph Neural Networks?, ICLR 2019.
[Maron et al. 19] Provably Powerful Graph Networks, NeurIPS 2019.
[Yun et al. 19] Graph Transformer Neteworks, NeurIPS 2019.
[Loukas 20] What Graph Neural Networks Cannot Learn: Depth vs Width, ICLR 2020.
[Bianchi et al. 20] Spectral Clustering with Graph Neural Networks for Graph Pooling, ICML 2020.
[Xhonneux et al. 20] Continuous Graph Neural Networks, ICML 2020.
[Garg et al. 20] Generalization and Representational Limits of Graph Neural Networks, ICML 2020.
[Baek et al. 21] Accurate Learning of Graph Representations with Graph Multiset Pooling, ICLR 2021.
[Liu et al. 21] Elastic Graph Neural Networks, ICML 2021.
[Li et al. 21] Training Graph Neural networks with 1000 Layers, ICML 2021.
[Jo et al. 21] Edge Representation Learning with Hypergraphs, NeurIPS 2021.

[Guo et al. 22] Data-Efficient Graph Grammar Learning for Molecular Generation, ICLR 2022.
[Geerts et al. 22] Expressiveness and Approximation Properties of Graph Neural Networks, ICLR 2022.
[Bevilacqua et al. 22] Equivariant Subgraph Aggregation Networks, ICLR 2022.
[Jo et al. 22] Score-based Generative Modeling of Graphs via the System of Stochastic Differential Equations, ICML 2022.
[Hoogeboom et al. 22] Equivariant Diffusion for Molecule Generation in 3D, ICML 2022.

Federated Learning

[Konečný et al. 16] Federated Optimization: Distributed Machine Learning for On-Device Intelligence, arXiv Preprint, 2016.
[Konečný et al. 16] Federated Learning: Strategies for Improving Communication Efficiency, NIPS Workshop on Private Multi-Party Machine Learning 2016.
[McMahan et al. 17] Communication-Efficient Learning of Deep Networks from Decentralized Data, AISTATS 2017.
[Smith et al. 17] Federated Multi-Task Learning, NIPS 2017.
[Li et al. 20] Federated Optimization in Heterogeneous Networks, MLSys 2020.
[Yurochkin et al. 19] Bayesian Nonparametric Federated Learning of Neural Networks, ICML 2019.
[Bonawitz et al. 19] Towards Federated Learning at Scale: System Design, MLSys 2019.
[Wang et al. 20] Federated Learning with Matched Averaging, ICLR 2020.
[Li et al. 20] On the Convergence of FedAvg on Non-IID data, ICLR 2020.
[Karimireddy et al. 20] SCAFFOLD: Stochastic Controlled Averaging for Federated Learning, ICML 2020.
[Hamer et al. 20] FedBoost: Communication-Efficient Algorithms for Federated Learning, ICML 2020.
[Rothchild et al. 20] FetchSGD: Communication-Efficient Federated Learning with Sketching, ICML 2020.
[Fallah et al. 21] Personalized Federated Learning with Theoretical Guarantees: A Model-Agnostic Meta-Learning Approach, NeurIPS 2020.
[Reddi et al. 21] Adaptive Federated Optimization, ICLR 2021.
[Jeong et al. 21] Federated Semi-Supervised Learning with Inter-Client Consistency & Disjoint Learning, ICLR 2021.
[Yoon et al. 21] Federated Continual Learning with Weighted Inter-client Transfer, ICML 2021.
[Li et al. 21] Ditto: Fair and Robust Federated Learning Through Personalization, ICML 2021.

Neural Architecture Search

[Zoph and Le 17] Neural Architecture Search with Reinforcement Learning, ICLR 2017.
[Baker et al. 17] Designing Neural Network Architectures using Reinforcement Learning, ICLR 2017.
[Real et al. 17] Large-Scale Evolution of Image Classifiers, ICML 2017.
[Liu et al. 18] Hierarchical Representations for Efficient Architecture Search, ICLR 2018.
[Pham et al. 18] Efficient Neural Architecture Search via Parameters Sharing, ICML 2018.
[Luo et al. 18] Neural Architecture Optimization, NeurIPS 2018.
[Liu et al. 19] DARTS: Differentiable Architecture Search, ICLR 2019.
[Tan et al. 19] MnasNet: Platform-Aware Neural Architecture Search for Mobile, CVPR 2019.
[Cai et al. 19] ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware, ICLR 2019.
[Zhou et al. 19] BayesNAS: A Bayesian Approach for Neural Architecture Search, ICML 2019.
[Tan and Le 19] EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks, ICML 2019.
[Guo et al. 19] NAT: Neural Architecture Transformer for Accurate and Compact Architectures, NeurIPS 2019.
[Chen et al. 19] DetNAS: Backbone Search for Object Detection, NeurIPS 2019.
[Dong and Yang 20] NAS-Bench-201: Extending the Scope of Reproducible Neural Architecture Search, ICLR 2020.
[Zela et al. 20] Understanding and Robustifying Differentiable Architecture Search, ICLR 2020.
[Cai et al. 20] Once-for-All: Train One Network and Specialize it for Efficient Deployment, ICLR 2020.
[Such et al. 20] Generative Teaching Networks: Accelerating Neural Architecture Search by Learning to Generate Synthetic Training Data, ICML 2020.
[Liu et al. 20] Are Labels Necessary for Neural Architecture Search?, ECCV 2020.
[Dudziak et al. 20] BRP-NAS: Prediction-based NAS using GCNs, NeurIPS 2020.
[Li et al. 20] Neural Architecture Search in A Proxy Validation Loss Landscape, ICML 2020.
[Lee et al. 21] Rapid Neural Architecture Search by Learning to Generate Graphs from Datasets, ICLR 2021.
[Mellor et al. 21] Neural Architecture Search without Training, ICML 2021.

Large Language Models

[Shoeybi et al. 19] Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism, arXiv preprint, 2019.
[Raffel et al. 20] Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer, JMLR 2020.
[Gururangan et al. 20] Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks, ACL 2020.
[Brown et al. 20] Language Models are Few-shot Learners, NeurIPS 2020.
[Rae et al. 21] Scaling Language Models: Methods, Analysis & Insights from Training Gopher, arXiv preprint, 2021.

[Thoppilan et al. 22] LaMDA: Language Models for Dialog Applications, arXiv preprint, 2022.
[Wei et al. 22] Finetuned Langauge Models Are Zero-Shot Learners, ICLR 2022.
[Wang et al. 22] Language Modeling via Stochastic Processes, ICLR 2022.
[Alayrac et al. 22] Flamingo: a Visual Language Model for Few-Shot Learning, arXiv preprint, 2022.
[Chowdhery et al. 22] PaLM: Scaling Langauge Modeling with Pathways, arXiv preprint, 2022.
[Wei et al. 22] Chain-of-Thought Prompting Elicits Reasoning in Large Language Models, NeurIPS 2022.

Multimodal Generative Models

[Li et al. 19] Controllable Text-to-Image Generation, NeurIPS 2019.
[Ramesh et al. 21] Zero-Shot Text-to-Image Generation, ICML 2021.
[Radford et al. 21] Learning Transferable Visual Models From Natural Language Supervision, ICML 2021.
[Ding et al. 21] CogView: Mastering Text-to-Image Generation via Transformers, NeurIPS 2021.
[Zou et al. 22] Towards Language-Free Training for Text-to-Image Generation, CVPR 2022.

[Rombach et al. 22] High-Resolution Image Synthesis with Latent Diffusion Models, CVPR 2022.
[Nichol et al. 22] GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models, ICML 2022.
[Saharia et al. 22] Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding, arXiv preprint, 2022.
[Yu et al. 22] Scaling Autoregressive Models for Content-Rich Text-to-Image Generation, arXiv preprint, 2022.