Awesome
Awesome-ScalingLaws
A curated list of awesome resources dedicated to Scaling Laws for LLMs (Large Language Models).
Contributing: Please feel free to make pull requests.
What's the Scaling Laws of LLMs? It means that the test loss of a model can be predicted using a power-law when performance is limited by only either the number of model's parameters, the dataset size, or the optimally allocated compute budget.
Papers
-
Deep Learning Scaling is Predictable, Empirically
Joel Hestness, Sharan Narang, Newsha Ardalani, Gregory Diamos, Heewoo Jun, Hassan Kianinejad, Md. Mostofa Ali Patwary, Yang Yang, Yanqi Zhou [paper] Arxiv 2017.12
-
One Epoch Is All You Need
Aran Komatsuzaki [paper] Arxiv 2019.06
-
A Constructive Prediction of the Generalization Error Across Scales
Jonathan S. Rosenfeld, Amir Rosenfeld, Yonatan Belinkov, Nir Shavit [paper] ICLR2020 2019.09
-
Scaling Laws for Neural Language Models
Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, Dario Amodei [paper] Arxiv 2020.01
-
Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers
Zhuohan Li, Eric Wallace, Sheng Shen, Kevin Lin, Kurt Keutzer, Dan Klein, Joseph E. Gonzalez [paper] ICML2020 2020.02
-
Recipes for building an open-domain chatbot
Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston [paper] EACL2021 2020.04
-
On the Predictability of Pruning Across Scales
Jonathan S. Rosenfeld, Jonathan Frankle, Michael Carbin, Nir Shavit [paper] ICML2021 2020.06
-
The Computational Limits of Deep Learning
Neil C. Thompson, Kristjan Greenewald, Keeheon Lee, Gabriel F. Manso [paper] Arxiv 2020.07
-
Scaling Laws for Autoregressive Generative Modeling
Tom Henighan, Jared Kaplan, Mor Katz, Mark Chen, Christopher Hesse, Jacob Jackson, Heewoo Jun, Tom B. Brown, Prafulla Dhariwal, Scott Gray, Chris Hallacy, Benjamin Mann, Alec Radford, Aditya Ramesh, Nick Ryder, Daniel M. Ziegler, John Schulman, Dario Amodei, Sam McCandlish [paper] Arxiv 2020.10
-
Scaling Laws for Transfer
Danny Hernandez, Jared Kaplan, Tom Henighan, Sam McCandlish [paper] Arxiv 2021.02
-
Explaining Neural Scaling Laws
Yasaman Bahri, Ethan Dyer, Jared Kaplan, Jaehoon Lee, Utkarsh Sharma [paper] Arxiv 2021.02
-
Universal scaling laws in the gradient descent training of neural networks
Maksim Velikanov, Dmitry Yarotsky [paper] Arxiv 2021.05
-
Learning to Limit Data Collection via Scaling Laws: A Computational Interpretation for the Legal Principle of Data Minimization
Divya Shanmugam, Samira Shabanian, Fernando Diaz, Michèle Finck, Asia Biega [paper] ACM Conference on Fairness 2021.07
-
Scaling Laws for Deep Learning
Jonathan S. Rosenfeld [paper] MIT PhD thesis 2021.08
-
A Scaling Law for Synthetic-to-Real Transfer: How Much Is Your Pre-training Effective?
Hiroaki Mikami, Kenji Fukumizu, Shogo Murai, Shuji Suzuki, Yuta Kikuchi, Taiji Suzuki, Shin-ichi Maeda, Kohei Hayashi [paper] Arxiv 2021.08
-
Scaling Laws for Neural Machine Translation
Behrooz Ghorbani, Orhan Firat, Markus Freitag, Ankur Bapna, Maxim Krikun, Xavier Garcia, Ciprian Chelba, Colin Cherry [paper] Arxiv 2021.09
-
Scaling Laws for the Few-Shot Adaptation of Pre-trained Image Classifiers
Gabriele Prato, Simon Guiroy, Ethan Caballero, Irina Rish, Sarath Chandar [paper] Arxiv 2021.10
-
Scaling Law for Recommendation Models: Towards General-purpose User Representations
Kyuyong Shin, Hanock Kwak, Su Young Kim, Max Nihlen Ramstrom, Jisu Jeong, Jung-Woo Ha, Kyung-Min Kim [paper] AAAI2023 2021.11
-
Unified Scaling Laws for Routed Language Models
Aidan Clark, Diego de las Casas, Aurelia Guy, Arthur Mensch, Michela Paganini, Jordan Hoffmann, Bogdan Damoc, Blake Hechtman, Trevor Cai, Sebastian Borgeaud, George van den Driessche, Eliza Rutherford, Tom Hennigan, Matthew Johnson, Katie Millican, Albin Cassirer, Chris Jones, Elena Buchatskaya, David Budden, Laurent Sifre, Simon Osindero, Oriol Vinyals, Jack Rae, Erich Elsen, Koray Kavukcuoglu, Karen Simonyan [paper] ICML2022 2022.02
-
Data Scaling Laws in NMT: The Effect of Noise and Architecture
Yamini Bansal, Behrooz Ghorbani, Ankush Garg, Biao Zhang, Maxim Krikun, Colin Cherry, Behnam Neyshabur, Orhan Firat [paper] Arxiv 2022.02
-
Scaling Laws Under the Microscope: Predicting Transformer Performance from Small Scale Experiments
Maor Ivgi, Yair Carmon, Jonathan Berant [paper] EMNLP2022 2022.02
-
Training Compute-Optimal Large Language Models
Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford, Diego de Las Casas, Lisa Anne Hendricks, Johannes Welbl, Aidan Clark, Tom Hennigan, Eric Noland, Katie Millican, George van den Driessche, Bogdan Damoc, Aurelia Guy, Simon Osindero, Karen Simonyan, Erich Elsen, Jack W. Rae, Oriol Vinyals, Laurent Sifre [paper] Arxiv 2022.03
-
Scaling Laws and Interpretability of Learning from Repeated Data
Danny Hernandez, Tom Brown, Tom Conerly, Nova DasSarma, Dawn Drain, Sheer El-Showk, Nelson Elhage, Zac Hatfield-Dodds, Tom Henighan, Tristan Hume, Scott Johnston, Ben Mann, Chris Olah, Catherine Olsson, Dario Amodei, Nicholas Joseph, Jared Kaplan, Sam McCandlish [paper] Arxiv 2022.05
-
Beyond neural scaling laws: beating power law scaling via data pruning
Ben Sorscher, Robert Geirhos, Shashank Shekhar, Surya Ganguli, Ari S. Morcos [paper] NeurIPS2022 2022.06
-
Scaling Laws vs Model Architectures: How does Inductive Bias Influence Scaling?
Yi Tay, Mostafa Dehghani, Samira Abnar, Hyung Won Chung, William Fedus, Jinfeng Rao, Sharan Narang, Vinh Q. Tran, Dani Yogatama, Donald Metzler [paper] Arxiv 2022.07
-
Understanding Scaling Laws for Recommendation Models
Newsha Ardalani, Carole-Jean Wu, Zeliang Chen, Bhargav Bhushanam, Adnan Aziz [paper] Arxiv 2022.08
-
Revisiting Neural Scaling Laws in Language and Vision
Ibrahim Alabdulmohsin, Behnam Neyshabur, Xiaohua Zhai [paper] NeurIPS2022 2022.09
-
Scaling Laws For Deep Learning Based Image Reconstruction
Tobit Klug, Reinhard Heckel [paper] ICLR2023 2022.09
-
Scaling Laws for a Multi-Agent Reinforcement Learning Model
Oren Neumann, Claudius Gros [paper] Arxiv 2022.10
-
How Much Data Are Augmentations Worth? An Investigation into Scaling Laws, Invariance, and Implicit Regularization
Jonas Geiping, Micah Goldblum, Gowthami Somepalli, Ravid Shwartz-Ziv, Tom Goldstein, Andrew Gordon Wilson [paper] ICLR2023 2022.10
-
Scaling Laws for Reward Model Overoptimization
Leo Gao, John Schulman, Jacob Hilton [paper] Arxiv 2022.10
-
Transcending Scaling Laws with 0.1% Extra Compute
Yi Tay, Jason Wei, Hyung Won Chung, Vinh Q. Tran, David R. So, Siamak Shakeri, Xavier Garcia, Huaixiu Steven Zheng, Jinfeng Rao, Aakanksha Chowdhery, Denny Zhou, Donald Metzler, Slav Petrov, Neil Houlsby, Quoc V. Le, Mostafa Dehghani [paper] Arxiv 2022.10
-
Scaling Laws Beyond Backpropagation
Matthew J. Filipovich, Alessandro Cappelli, Daniel Hesslow, Julien Launay [paper] NeurIPS2022 2022.10
-
Broken Neural Scaling Laws
Ethan Caballero, Kshitij Gupta, Irina Rish, David Krueger [paper] ICLR2023 2022.10
-
A Solvable Model of Neural Scaling Laws
Alexander Maloney, Daniel A. Roberts, James Sully [paper] Arxiv 2022.10
-
An Information-Theoretic Analysis of Compute-Optimal Neural Scaling Laws
Hong Jun Jeon, Benjamin Van Roy [paper] Arxiv 2022.12
-
Reproducible scaling laws for contrastive language-image learning
Mehdi Cherti, Romain Beaumont, Ross Wightman, Mitchell Wortsman, Gabriel Ilharco, Cade Gordon, Christoph Schuhmann, Ludwig Schmidt, Jenia Jitsev [paper] Arxiv 2022.12
-
The case for 4-bit precision: k-bit Inference Scaling Laws
Tim Dettmers, Luke Zettlemoyer [paper] Arxiv 2022.12
-
Scaling Laws for Generative Mixed-Modal Language Models
Armen Aghajanyan, Lili Yu, Alexis Conneau, Wei-Ning Hsu, Karen Hambardzumyan, Susan Zhang, Stephen Roller, Naman Goyal, Omer Levy, Luke Zettlemoyer [paper] Arxiv 2023.01
-
Scaling laws for single-agent reinforcement learning
Jacob Hilton, Jie Tang, John Schulman [paper] Arxiv 2023.01
-
Data pruning and neural scaling laws: fundamental limitations of score-based algorithms
Fadhel Ayed, Soufiane Hayou [paper] Arxiv 2023.02
-
Scaling Laws for Multilingual Neural Machine Translation
Patrick Fernandes, Behrooz Ghorbani, Xavier Garcia, Markus Freitag, Orhan Firat [paper] Arxiv 2023.02
-
GPT-4 Technical Report
OpenAI [paper] Arxiv 2023.03