Home

Awesome

deep-learning-dynamics-paper-list

This is a list of peer-reviewed representative papers on deep learning dynamics (training/optimization dynamics of neural networks). We hope to enjoy the grand adventure of exploring deep learning dynamics with more researchers. Corrections and suggestions are welcomed.

Introduction

The success of deep learning attributes to both deep network architecture and stochastic optimization. Understanding optimization dynamics of neural networks/deep learning dynamics is a key challenge in theoretical foundations of deep learning and a promising way to further improve empirical success of deep learning. We consider learning dynamics of optimization as a reductionism approach. Many deep learning techniques can be analyzed and interpreted from a dynamical perspective. In the context of neural networks, learning dynamical analysis provides new insights and theories beyond conventional convergence analysis of stochastic optimiztion. A large body of related works have been published on top machine learning conferences and journals. However, a lterature review in this line of research is largely missing. It is highly valuable to continuously collect and share these great works. This is exactly the main purpose of the paper list. Note that this paper list does not focus on the conventional convergence analysis in optimization and forward dynamics of neural networks.

The paper list covers five main directions:

(1) Learning Dynamics of GD and SGD,

(2) Learning Dynamics of Momentum,

(3) Learning Dynmaics of Adaptive Gradient Methods,

(4) Learning Dynamics with Training Techniques (e.g. Weight Decay, Normalization Layers, Gradient Clipping, etc.),

(5) Learning Dynamics beyond Standard Training (e.g. Self-Supervised Learning, Continual Learning, Privacy, etc.).

1. Learning Dynamics of GD and SGD

2. Learning Dynmaics of Momentum

3. Learning Dynmaics of Adaptive Gradient Methods

4. Learning Dynamics with Training Techniques

5. Learning Dynamics beyond Standard Training

Citing

If you find the paper list useful for your research, you are highly welcomed to cite our representative works on this topic!

They covered important related works and touched fundamental issues in this line of research.

[1] ICLR 2021: SGD dynamics for flat minima selection.

@inproceedings{
  xie2021diffusion,
  title={A Diffusion Theory For Deep Learning Dynamics: Stochastic Gradient Descent Exponentially Favors Flat Minima},
  author={Zeke Xie and Issei Sato and Masashi Sugiyama},
  booktitle={International Conference on Learning Representations},
  year={2021},
  url={https://openreview.net/forum?id=wXgk_iCiYGo}
}

[2] ICML 2022 (Oral): SGD and Adam dynamics for saddle-point escaping and minima selection.

@InProceedings{xie2022adaptive,
  title = 	 {Adaptive Inertia: Disentangling the Effects of Adaptive Learning Rate and Momentum},
  author =       {Xie, Zeke and Wang, Xinrui and Zhang, Huishuai and Sato, Issei and Sugiyama, Masashi},
  booktitle = 	 {Proceedings of the 39th International Conference on Machine Learning},
  pages = 	 {24430--24459},
  year = 	 {2022}
  volume = 	 {162},
  series = 	 {Proceedings of Machine Learning Research}
}