Home

Awesome

Awesome Knowledge Distillation of LLM Papers

Awesome PDF

<!-- Big font size --> <h2 align="center"> A Survey on Knowledge Distillation of Large Language Models </h2> <p align="center"> Xiaohan Xu<sup>1</sup>&nbsp&nbsp Ming Li<sup>2</sup>&nbsp&nbsp Chongyang Tao<sup>3</sup>&nbsp&nbsp Tao Shen<sup>4</sup>&nbsp&nbsp Reynold Cheng<sup>1</sup>&nbsp&nbsp Jinyang Li<sup>1</sup>&nbsp&nbsp Can Xu<sup>5</sup>&nbsp&nbsp Dacheng Tao<sup>6</sup>&nbsp&nbsp Tianyi Zhou<sup>2</sup>&nbsp&nbsp </p> <p align="center"> <sup>1</sup> The University of Hong Kong &nbsp&nbsp <sup>2</sup> University of Maryland &nbsp&nbsp <sup>3</sup> Microsoft &nbsp&nbsp <sup>4</sup> University of Technology Sydney &nbsp&nbsp <sup>5</sup> Peking University &nbsp&nbsp <sup>6</sup> The University of Sydney </p> <div align="center"> <img src="imgs/framework.png" width="700"><br> </div> <br>

A collection of papers related to knowledge distillation of large language models (LLMs). If you want to use LLMs for benefitting your own smaller models training, or use self-generated knowledge to achieve the self-improvement, just take a look at this collection.

We will update this collection every week. Welcome to star ⭐️ this repo to keep track of the updates.

❗️Legal Consideration: It's crucial to note the legal implications of utilizing LLM outputs, such as those from ChatGPT (Restrictions), Llama (License), etc. We strongly advise users to adhere to the terms of use specified by the model providers, such as the restrictions on developing competitive products, and so on.

💡 News

Contributing to This Collection

Feel free to open an issue/PR or e-mail shawnxxh@gmail.com, minglii@umd.edu, hishentao@gmail.com and chongyangtao@gmail.com if you find any missing taxonomies or papers. We will keep updating this collection and survey.

📝 Introduction

KD of LLMs: This survey delves into knowledge distillation (KD) techniques in Large Language Models (LLMs), highlighting KD's crucial role in transferring advanced capabilities from proprietary LLMs like GPT-4 to open-source counterparts such as LLaMA and Mistral. We also explore how KD enables the compression and self-improvement of open-source LLMs by using them as teachers.

KD and Data Augmentation: Crucially, the survey navigates the intricate interplay between data augmentation (DA) and KD, illustrating how DA emerges as a powerful paradigm within the KD framework to bolster LLMs' performance. By leveraging DA to generate context-rich, skill-specific training data, KD transcends traditional boundaries, enabling open-source models to approximate the contextual adeptness, ethical alignment, and deep semantic insights characteristic of their proprietary counterparts.

Taxonomy: Our analysis is meticulously structured around three foundational pillars: algorithm, skill, and verticalization -- providing a comprehensive examination of KD mechanisms, the enhancement of specific cognitive abilities, and their practical implications across diverse fields.

KD Algorithms: For KD algorithms, we categorize it into two principal steps: "Knowledge Elicitation" focusing on eliciting knowledge from teacher LLMs, and "Distillation Algorithms" centered on injecting this knowledge into student models.

<div align="center"> <img src="imgs/knowledge.png" width="600"><br> <em>Figure: An illustration of different knowledge elicitation methods from teacher LLMs.</em> </div> <br>

Skill Distillation: We delve into the enhancement of specific cognitive abilities, such as context following, alignment, agent, NLP task specialization, and multi-modality.

Verticalization Distillation: We explore the practical implications of KD across diverse fields, including law, medical & healthcare, finance, science, and miscellaneous domains.

Note that both Skill Distillation and Verticalization Distillation employ Knowledge Elicitation and Distillation Algorithms in KD Algorithms to achieve their KD. Thus, there are overlaps between them. However, this could also provide different perspectives for the papers.

Why KD of LLMs?

In the era of LLMs, KD of LLMs plays the following crucial roles:

<div align="center"> <img src="imgs/kd_role_bg.png" width="400"><br> </div> <br>
RoleDescriptionTrend
① Advancing Smaller ModelsTransferring advanced capabilities from proprietary LLMs to smaller models, such as open source LLMs or other smaller models.Most common
② CompressionCompressing open-source LLMs to make them more efficient and practical.More popular with the prosperity of open-source LLMs
③ Self-ImprovementRefining open-source LLMs' performance by leveraging their own knowledge, i.e. self-knowledge.New trend to make open-source LLMs more competitive

📒 Table of Contents

KD Algorithms

Knowledge Elicitation

Labeling

TitleVenueDateCodeData
Evidence-Focused Fact Summarization for Knowledge-Augmented Zero-Shot Question AnsweringarXiv2024-03
Aligning Large and Small Language Models via Chain-of-Thought ReasoningEACL2024-03Github
Divide-or-Conquer? Which Part Should You Distill Your LLM?arXiv2024-02
Miko: Multimodal Intention Knowledge Distillation from Large Language Models for Social-Media Commonsense DiscoveryarXiv2024-02
KnowTuning: Knowledge-aware Fine-tuning for Large Language ModelsarXiv2024-02Github
TinyLLM: Learning a Small Student from Multiple Large Language ModelsarXiv2024-02
Mixed Distillation Helps Smaller Language Model Better ReasoningarXiv2023-12
Tailoring Self-Rationalizers with Multi-Reward DistillationarXiv2023-11GithubData
Orca 2: Teaching Small Language Models How to ReasonarXiv2023-11
Mammoth: Building Math Generalist Models through Hybrid Instruction TuningarXiv2023-09GithubData
PandaLM: An Automatic Evaluation Benchmark for LLM Instruction Tuning OptimizationarXiv2023-06GithubData
Symbolic Chain-of-Thought Distillation: Small Models Can Also "Think" Step-by-StepACL2023-06
Orca: Progressive Learning from Complex Explanation Traces of GPT-4arXiv2023-06
Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model SizesACL2023-05GithubData
Impossible Distillation: from Low-Quality Model to High-Quality Dataset & Model for Summarization and ParaphrasingarXiv2023-05Github
Baize: An Open-Source Chat Model with Parameter-Efficient Tuning on Self-Chat DataEMNLP2023-04GithubData
ChatGPT outperforms crowd workers for text-annotation tasksarXiv2023-03
Annollm: Making large language models to be better crowdsourced annotatorsarXiv2023-03
GPT-4All: Training an Assistant-Style Chatbot with Large Scale Data Distillation from GPT-3.5-Turbo-2023-03Github
Specializing Smaller Language Models towards Multi-Step ReasoningarXiv2023-01
Is GPT-3 a Good Data Annotator?ACL2022-12Github
Large Language Models Are Reasoning TeachersACL2022-12GithubData
Teaching Small Language Models to ReasonACL2022-12
Explanations from Large Language Models Make Small Reasoners BetterarXiv2022-10
Want To Reduce Labeling Cost? GPT-3 Can HelpFindings of EMNLP2021-08

Expansion

TitleVenueDateCodeData
Instruction Fusion: Advancing Prompt Evolution through HybridizationarXiv2023-12
An Empirical Study of Instruction-tuning Large Language Models in ChineseEMNLP2023-10GithubData
PromptMix: A Class Boundary Augmentation Method for Large Language Model DistillationEMNLP2023-10Github
Wizardmath: Empowering mathematical reasoning for large language models via reinforced evol-instructarXiv2023-08Github
Code Llama: Open Foundation Models for CodearXiv2023-08Github
WizardCoder: Empowering Code Large Language Models with Evol-InstructICLR2023-06Github
Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human SupervisionNeurIPS2023-05GithubData
Targeted Data Generation: Finding and Fixing Model WeaknessesACL2023-05Github
Wizardlm: Empowering large language models to follow complex instructionsICLR2023-04GithubData <br> Data
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale InstructionsarXiv2023-04GithubData
Alpaca: Aligning Language Model with Human Preferences-2023-03GithubData
Code Alpaca: An Instruction-following LLaMA model for code generation-2023-03GithubData
Exploring the Impact of Instruction Data Scaling on Large Language Models: An Empirical Study on Real-World Use CasesarXiv2023-03GithubData
AugGPT: Leveraging ChatGPT for Text Data AugmentationarXiv2023-02Github
Self-instruct: Aligning language model with self generated instructionsACL2022-12GithubData
Symbolic Knowledge Distillation: from General Language Models to Commonsense ModelsNAACL2021-10GithubData

Curation

TitleVenueDateCodeData
Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language ModelsarXiv2024-02
Phi-2: The surprising power of small language models-2023-12
WaveCoder: Widespread And Versatile Enhanced Instruction Tuning with Refined Data GenerationarXiv2023-12
Magicoder: Source Code Is All You NeedarXiv2023-12GithubData <br> Data
MFTCoder: Boosting Code LLMs with Multitask Fine-TuningarXiv2023-11GithubData <br> Data
Textbooks Are All You Need II: Phi-1.5 Technical ReportarXiv2023-09
Neural Machine Translation Data Generation and Augmentation using ChatGPTarXiv2023-07
Textbooks Are All You Need: A Large-Scale Instructional Text Data Set for Language ModelsarXiv2023-06
Enhancing Chat Language Models by Scaling High-quality Instructional ConversationsarXiv2023-05GithubData
AugTriever: Unsupervised Dense Retrieval by Scalable Data AugmentationarXiv2022-12Github
SunGen: Self-Guided Noise-Free Data Generation for Efficient Zero-Shot LearningICLR2022-05Github
ZeroGen: Efficient Zero-shot Learning via Dataset GenerationEMNLP2022-02Github
InPars: Data Augmentation for Information Retrieval using Large Language ModelsarXiv2022-02GithubData
Towards Zero-Label Language LearningarXiv2021-09

Feature

TitleVenueDateCodeData
Rethinking Kullback-Leibler Divergence in Knowledge Distillation for Large Language ModelsarXiv2024-04
Direct Alignment of Draft Model for Speculative Decoding with Chat-Fine-Tuned LLMsarXiv2024-03
DB-LLM: Accurate Dual-Binarization for Efficient LLMsarXiv2024-02
BitDistiller: Unleashing the Potential of Sub-4-Bit LLMs via Self-DistillationarXiv2024-02Github
DISTILLM: Towards Streamlined Distillation for Large Language ModelsarXiv2024-02Github
Towards Cross-Tokenizer Distillation: the Universal Logit Distillation Loss for LLMsarXiv2024-02GithubData
Revisiting Knowledge Distillation for Autoregressive Language ModelsarXiv2024-02
Knowledge Fusion of Large Language ModelsICLR2024-01Github
Improving In-context Learning via Bidirectional AlignmentarXiv2023-12
Towards the Fundamental Limits of Knowledge Transfer over Finite DomainsNeurIPS2023-10
Baby Llama: Knowledge Distillation from an Ensemble of Teachers Trained on a Small Dataset with No Performance PenaltyCoNLL2023-08GithubData
f-Divergence Minimization for Sequence-Level Knowledge DistillationACL2023-07GithubData
MiniLLM: Knowledge Distillation of Large Language ModelsICLR2023-06GithubData
On-Policy Distillation of Language Models: Learning from Self-Generated MistakesICLR2023-06
LLM-QAT: Data-Free Quantization Aware Training for Large Language ModelsarXiv2023-05GithubData
Less is more: Task-aware layer-wise distillation for language model compressionPMLR2022-10Github

Feedback

TitleVenueDateCodeData
Evidence-Focused Fact Summarization for Knowledge-Augmented Zero-Shot Question AnsweringarXiv2024-03
Evolving Knowledge Distillation with Large Language Models and Active LearningarXiv2024-03
Direct Language Model Alignment from Online AI FeedbackarXiv2024-02
DISTILLM: Towards Streamlined Distillation for Large Language ModelsarXiv2024-02Github
Improving Large Language Models via Fine-grained Reinforcement Learning with Minimum Editing ConstraintarXiv2024-01Github
Beyond Imitation: Leveraging Fine-grained Quality Signals for AlignmentarXiv2023-11
Can Language Models Teach Weaker Agents? Teacher Explanations Improve Students via PersonalizationICLR2023-10Github
Motif: Intrinsic Motivation from Artificial Intelligence FeedbackICLR2023-10Github
Ultrafeedback: Boosting language models with high-quality feedbackarXiv2023-10GithubData
Personalised Distillation: Empowering Open-Sourced LLMs with Adaptive Learning for Code GenerationEMNLP2023-10Github
CycleAlign: Iterative Distillation from Black-box LLM to White-box Models for Better Human AlignmentarXiv2023-10
Rlaif: Scaling Reinforcement Learning from Human Feedback with AI FeedbackarXiv2023-09
Wizardmath: Empowering mathematical reasoning for large language models via reinforced evol-instructarXiv2023-08Github
On-Policy Distillation of Language Models: Learning from Self-Generated MistakesICLR2023-06
MiniLLM: Knowledge Distillation of Large Language ModelsICLR2023-06GithubData
Language to Rewards for Robotic Skill SynthesisarXiv2023-06Github
Lion: Adversarial Distillation of Closed-Source Large Language ModelEMNLP2023-05Github
SelFee: Iterative Self-Revising LLM Empowered by Self-Feedback GenerationarXiv2023-05
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale InstructionsarXiv2023-04GithubData
Reward Design with Language ModelsICLR2023-03Github
Consitutional AI: Harmlessness from AI FeedbackarXiv2022-12

Self-Knowledge

TitleVenueDateCodeData
V-STaR: Training Verifiers for Self-Taught ReasonersarXiv2024-02
Self-Rewarding Language ModelsarXiv2024-01Github
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language ModelsarXiv2024-01GithubData
Kun: Answer Polishment for Chinese Self-Alignment with Instruction Back-TranslationarXiv2024-01GithubData
APT: Adaptive Pruning and Tuning Pretrained Language Models for Efficient Training and InferencearXiv2024-01
GRATH: Gradual Self-Truthifying for Large Language ModelsarXiv2024-01
Beyond human data: Scaling self-training for problem-solving with language modelsarXiv2023-12
Self-Knowledge Guided Retrieval Augmentation for Large Language ModelsEMNLP Findings2023-10Github
RAIN: Your Language Models Can Align Themselves without FinetuningarXiv2023-09Github
Reinforced Self-Training (ReST) for Language ModelingarXiv2023-08
Humback: Self-Alignment with Instruction BacktranslationICLR2023-08Github
Self-Alignment of Large Language Models via Reinforcement Learning from Contrast DistillationICLR2023-07Github
Self-Improvement of Large Language Models via Reinforcement Learning from Human FeedbackEMNLP2023-06
Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human SupervisionNeurIPS2023-05GithubData
Impossible Distillation: from Low-Quality Model to High-Quality Dataset & Model for Summarization and ParaphrasingarXiv2023-05Github
Language Model Self-improvement by Reinforcement Learning ContemplationarXiv2023-05
Baize: An Open-Source Chat Model with Parameter-Efficient Tuning on Self-Chat DataEMNLP2023-04GithubData
Self-instruct: Aligning language model with self generated instructionsACL2022-12GithubData
Large Language Models Can Self-ImproveEMNLP2022-10
STaR: Bootstrapping Reasoning With ReasoningNeurIPS2022-03Github

Distillation Algorithms

Supervised Fine-Tuning

Due to the large number of works applying supervised fine-tuning, we only list the most representative ones here.

TitleVenueDateCodeData
Evidence-Focused Fact Summarization for Knowledge-Augmented Zero-Shot Question AnsweringarXiv2024-03
Aligning Large and Small Language Models via Chain-of-Thought ReasoningEACL2024-03Github
Divide-or-Conquer? Which Part Should You Distill Your LLM?arXiv2024-02
Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language ModelsarXiv2024-02
Orca 2: Teaching Small Language Models How to ReasonarXiv2023-11
TinyLLM: Learning a Small Student from Multiple Large Language ModelsarXiv2024-02
Wizardmath: Empowering mathematical reasoning for large language models via reinforced evol-instructarXiv2023-08Github
Orca: Progressive Learning from Complex Explanation Traces of GPT-4arXiv2023-06
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale InstructionsarXiv2023-04GithubData
Wizardlm: Empowering large language models to follow complex instructionsICLR2023-04GithubData <br> Data
Baize: An Open-Source Chat Model with Parameter-Efficient Tuning on Self-Chat DataEMNLP2023-04GithubData
Alpaca: Aligning Language Model with Human Preferences-2023-03GithubData
Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90% ChatGPT Quality*-2023-03GithubData
Self-instruct: Aligning language model with self generated instructionsACL2022-12GithubData
Large Language Models Can Self-ImproveEMNLP2022-10
STaR: Bootstrapping Reasoning With ReasoningNeurIPS2022-03Github

Divergence and Similarity

TitleVenueDateCodeData
Rethinking Kullback-Leibler Divergence in Knowledge Distillation for Large Language ModelsarXiv2024-04
Weight-Inherited Distillation for Task-Agnostic BERT CompressionNAACL2024-03Github
BitDistiller: Unleashing the Potential of Sub-4-Bit LLMs via Self-DistillationarXiv2024-02Github
DISTILLM: Towards Streamlined Distillation for Large Language ModelsarXiv2024-02Github
Towards Cross-Tokenizer Distillation: the Universal Logit Distillation Loss for LLMsarXiv2024-02GithubData
Revisiting Knowledge Distillation for Autoregressive Language ModelsarXiv2024-02
Knowledge Distillation for Closed-Source Language ModelsarXiv2024-01
Knowledge Fusion of Large Language ModelsICLR2024-01Github
Improving In-context Learning via Bidirectional AlignmentarXiv2023-12
Towards the Fundamental Limits of Knowledge Transfer over Finite DomainsNeurIPS2023-10
Baby Llama: Knowledge Distillation from an Ensemble of Teachers Trained on a Small Dataset with No Performance PenaltyCoNLL2023-08GithubData
f-Divergence Minimization for Sequence-Level Knowledge DistillationACL2023-07GithubData
f-Divergence Minimization for Sequence-Level Knowledge DistillationACL2023-07GithubData
MiniLLM: Knowledge Distillation of Large Language ModelsICLR2023-06GithubData
On-Policy Distillation of Language Models: Learning from Self-Generated MistakesICLR2023-06
LLM-QAT: Data-Free Quantization Aware Training for Large Language ModelsarXiv2023-05GithubData
Less is more: Task-aware layer-wise distillation for language model compressionPMLR2022-10Github
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighterNeurIPS2019-10

Reinforcement Learning

TitleVenueDateCodeData
Direct Language Model Alignment from Online AI FeedbackarXiv2024-02
Improving Large Language Models via Fine-grained Reinforcement Learning with Minimum Editing ConstraintarXiv2024-01Github
Accelerating Reinforcement Learning of Robotic Manipulations via Feedback from Large Language ModelsCoRL2023-11
Motif: Intrinsic Motivation from Artificial Intelligence FeedbackICLR2023-10Github
Ultrafeedback: Boosting language models with high-quality feedbackarXiv2023-10GithubData
Eureka: Human-Level Reward Design via Coding Large Language ModelsarXiv2023-10Github
Rlaif: Scaling Reinforcement Learning from Human Feedback with AI FeedbackarXiv2023-09
Wizardmath: Empowering mathematical reasoning for large language models via reinforced evol-instructarXiv2023-08Github
On-Policy Distillation of Language Models: Learning from Self-Generated MistakesICLR2023-06
Aligning Large Language Models through Synthetic FeedbackEMNLP2023-05GithubData
Language Model Self-improvement by Reinforcement Learning ContemplationarXiv2023-05
Consitutional AI: Harmlessness from AI FeedbackarXiv2022-12

Rank Optimization

TitleVenueDateCodeData
Evidence-Focused Fact Summarization for Knowledge-Augmented Zero-Shot Question AnsweringarXiv2024-03
KnowTuning: Knowledge-aware Fine-tuning for Large Language ModelsarXiv2024-02Github
Self-Rewarding Language ModelsarXiv2024-01Github
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language ModelsarXiv2024-01GithubData
Zephyr: Direct Distillation of Language Model AlignmentarXiv2023-10GithubData
CycleAlign: Iterative Distillation from Black-box LLM to White-box Models for Better Human AlignmentarXiv2023-10

Skill Distillation

Context Following

Instruction Following

TitleVenueDateCodeData
Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language ModelsarXiv2024-02
Revisiting Knowledge Distillation for Autoregressive Language ModelsarXiv2024-02
Selective Reflection-Tuning: Student-Selected Data Recycling for LLM Instruction-TuningarXiv2024-02GithubData
Phi-2: The surprising power of small language models-2023-12
What Makes Good Data for Alignment? A Comprehensive Study of Automatic Data Selection in Instruction TuningICLR2023-12GithubData
MUFFIN: Curating Multi-Faceted Instructions for Improving Instruction-FollowingarXiv2023-12GithubData
Instruction Fusion: Advancing Prompt Evolution through HybridizationarXiv2023-12
Orca 2: Teaching Small Language Models How to ReasonarXiv2023-11
Reflection-Tuning: Data Recycling Improves LLM Instruction-TuningNIPS Workshop2023-10GithubData
Textbooks Are All You Need II: Phi-1.5 Technical ReportarXiv2023-09
Orca: Progressive Learning from Complex Explanation Traces of GPT-4arXiv2023-06
Textbooks Are All You Need: A Large-Scale Instructional Text Data Set for Language ModelsarXiv2023-06
SelFee: Iterative Self-Revising LLM Empowered by Self-Feedback GenerationarXiv2023-05
ExpertPrompting: Instructing Large Language Models to be Distinguished ExpertsarXiv2023-05GithubData
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale InstructionsarXiv2023-04GithubData
Wizardlm: Empowering large language models to follow complex instructionsICLR2023-04GithubData <br> Data
Koala: A Dialogue Model for Academic Research-2023-04GithubData
Alpaca: Aligning Language Model with Human Preferences-2023-03GithubData
Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90% ChatGPT Quality*-2023-03GithubData
Self-instruct: Aligning language model with self generated instructionsACL2022-12GithubData

Multi-turn Dialogue

TitleVenueDateCodeData
Zephyr: Direct Distillation of LM AlignmentarXiv2023-10GithubData
OPENCHAT: ADVANCING OPEN-SOURCE LANGUAGE MODELS WITH MIXED-QUALITY DATAICLR2023-09GithubData
Enhancing Chat Language Models by Scaling High-quality Instructional ConversationsarXiv2023-05GithubData
Baize: An Open-Source Chat Model with Parameter-Efficient Tuning on Self-Chat DataEMNLP2023-04GithubData
Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90% ChatGPT Quality*-2023-03GithubData

RAG Capability

TitleVenueDateCodeData
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-ReflectionNIPS2023-10GithubData
SAIL: Search-Augmented Instruction LearningarXiv2023-05GithubData
Knowledge-Augmented Reasoning Distillation for Small Language Models in Knowledge-Intensive TasksNIPS2023-05GithubData

Alignment

Thinking Pattern

TitleVenueDateCodeData
Aligning Large and Small Language Models via Chain-of-Thought ReasoningEACL2024-03Github
Divide-or-Conquer? Which Part Should You Distill Your LLM?arXiv2024-02
Selective Reflection-Tuning: Student-Selected Data Recycling for LLM Instruction-TuningarXiv2024-02GithubData
Can LLMs Speak For Diverse People? Tuning LLMs via Debate to Generate Controllable Controversial StatementsarXiv2024-02GithubData
Knowledgeable Preference Alignment for LLMs in Domain-specific Question AnsweringarXiv2023-11Github
Orca 2: Teaching Small Language Models How to ReasonarXiv2023-11
Reflection-Tuning: Data Recycling Improves LLM Instruction-TuningNIPS Workshop2023-10GithubData
Orca: Progressive Learning from Complex Explanation Traces of GPT-4arXiv2023-06
SelFee: Iterative Self-Revising LLM Empowered by Self-Feedback GenerationarXiv2023-05

Preference

TitleVenueDateCodeData
Ultrafeedback: Boosting language models with high-quality feedbackarXiv2023-10GithubData
Zephyr: Direct Distillation of LM AlignmentarXiv2023-10GithubData
Rlaif: Scaling Reinforcement Learning from Human Feedback with AI FeedbackarXiv2023-09
OPENCHAT: ADVANCING OPEN-SOURCE LANGUAGE MODELS WITH MIXED-QUALITY DATAICLR2023-09GithubData
RLCD: Reinforcement Learning from Contrast Distillation for Language Model AlignmentarXiv2023-07Github
Aligning Large Language Models through Synthetic FeedbacksEMNLP2023-05GithubData
Reward Design with Language ModelsICLR2023-03Github
Training Language Models with Language Feedback at ScalearXiv2023-03
Constitutional AI: Harmlessness from AI FeedbackarXiv2022-12

Value

TitleVenueDateCodeData
Ultrafeedback: Boosting language models with high-quality feedbackarXiv2023-10GithubData
RLCD: Reinforcement Learning from Contrast Distillation for Language Model AlignmentarXiv2023-07Github
Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human SupervisionNeurIPS2023-05GithubData
Training Socially Aligned Language Models on Simulated Social InteractionsarXiv2023-05
Constitutional AI: Harmlessness from AI FeedbackarXiv2022-12

Agent

Tool Using

TitleVenueDateCodeData
Toolformer: Language Models Can Teach Themselves to Use ToolsarXiv2023-02
Graph-ToolFormer: To Empower LLMs with Graph Reasoning Ability via Prompt Augmented by ChatGPTarXiv2023-04GithubData
Gorilla: Large Language Model Connected with Massive APIsarXiv2023-05GithubData
GPT4Tools: Teaching Large Language Model to Use Tools via Self-instructionarXiv2023-05GithubData
ToolAlpaca: Generalized Tool Learning for Language Models with 3000 Simulated CasesarXiv2023-06GithubData
ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIsarXiv2023-07GithubData
Confucius: Iterative Tool Learning from Introspection Feedback by Easy-to-Difficult CurriculumarXiv2023-08Github
CRAFT: Customizing LLMs by Creating and Retrieving from Specialized ToolsetsarXiv2023-09Github
MLLM-Tool: A Multimodal Large Language Model For Tool Agent LearningarXiv2024-01GithubData
Small LLMs Are Weak Tool Learners: A Multi-LLM AgentarXiv2024-01Github
EASYTOOL: Enhancing LLM-based Agents with Concise Tool InstructionarXiv2024-01Github

Planning

TitleVenueDateCodeData
AUTOACT: Automatic Agent Learning from Scratch via Self-PlanningarXiv2024-01Github
Lumos: Learning Agents with Unified Data, Modular Design, and Open-Source LLMsarXiv2023-11GithubData
TPTU-v2: Boosting Task Planning and Tool Usage of Large Language Model-based Agents in Real-world SystemsarXiv2023-11
Embodied Multi-Modal Agent trained by an LLM from a Parallel TextWorldarXiv2023-11
Accelerating Reinforcement Learning of Robotic Manipulations via Feedback from Large Language ModelsCoRL2023-11
Motif: Intrinsic Motivation from Artificial Intelligence FeedbackICLR2023-10Github
FireAct: Toward Language Agent Fine-tuningarXiv2023-10GithubData
AgentTuning: Enabling Generalized Agent Abilities for LLMsarXiv2023-10Github
Eureka: Human-Level Reward Design via Coding Large Language ModelsarXiv2023-10Github
Language Instructed Reinforcement Learning for Human-AI CoordinationPMLR2023-04
Guiding Pretraining in Reinforcement Learning with Large Language ModelsPMLR2023-02
Distilling Internet-Scale Vision-Language Models into Embodied AgentsICML2023-01

NLP Task Specialization

NLU

TitleVenueDateCodeData
LLM vs Small Model? Large Language Model Based Text Augmentation Enhanced Personality Detection ModelarXiv2024-03
Evolving Knowledge Distillation with Large Language Models and Active LearningarXiv2024-03
Mixed Distillation Helps Smaller Language Model Better ReasoningarXiv2023-12
PromptMix: A Class Boundary Augmentation Method for Large Language Model DistillationEMNLP2023-10Github
TinyLLM: Learning a Small Student from Multiple Large Language ModelsarXiv2024-02
Targeted Data Generation: Finding and Fixing Model WeaknessesACL2023-05Github
Distilling ChatGPT for Explainable Automated Student Answer AssessmentarXiv2023-05Github
ChatGPT outperforms crowd workers for text-annotation tasksarXiv2023-03
Annollm: Making large language models to be better crowdsourced annotatorsarXiv2023-03
AugGPT: Leveraging ChatGPT for Text Data AugmentationarXiv2023-02Github
Is GPT-3 a Good Data Annotator?ACL2022-12Github
SunGen: Self-Guided Noise-Free Data Generation for Efficient Zero-Shot LearningICLR2022-05Github
ZeroGen: Efficient Zero-shot Learning via Dataset GenerationEMNLP2022-02Github
Generating Training Data with Language Models: Towards Zero-Shot Language UnderstandingNeurIPS2022-02Github
Towards Zero-Label Language LearningarXiv2021-09
Generate, Annotate, and Learn: NLP with Synthetic TextTACL2021-06

NLG

TitleVenueDateCodeData
Tailoring Self-Rationalizers with Multi-Reward DistillationarXiv2023-11GithubData
RECOMP: Improving Retrieval-Augmented LMs with Compression and Selective AugmentationarXiv2023-10Github
Neural Machine Translation Data Generation and Augmentation using ChatGPTarXiv2023-07
On-Policy Distillation of Language Models: Learning from Self-Generated MistakesICLR2023-06
Can LLMs generate high-quality synthetic note-oriented doctor-patient conversations?arXiv2023-06GithubData
InheritSumm: A General, Versatile and Compact Summarizer by Distilling from GPTEMNLP2023-05
Impossible Distillation: from Low-Quality Model to High-Quality Dataset & Model for Summarization and ParaphrasingarXiv2023-05Github
Data Augmentation for Radiology Report SimplificationFindings of EACL2023-04Github
Want To Reduce Labeling Cost? GPT-3 Can HelpFindings of EMNLP2021-08

Information Retrieval

TitleVenueDateCodeData
InstructDistill: Instruction Distillation Makes Large Language Models Efficient Zero-shot RankersarXiv2023-11GithubData
Soft prompt tuning for augmenting dense retrieval with large language modelsarXiv2023-07Github
Query Rewriting in Retrieval-Augmented Large Language ModelsEMNLP2023-05
Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking AgentsEMNLP2023-04GithubData
AugTriever: Unsupervised Dense Retrieval by Scalable Data AugmentationarXiv2022-12Github
QUILL: Query Intent with Large Language Models using Retrieval Augmentation and Multi-stage DistillationEMNLP2022-10
Promptagator: Few-shot Dense Retrieval From 8 ExamplesICLR2022-09
Questions Are All You Need to Train a Dense Passage RetrievalTACL2022-06Github
Improving Passage Retrieval with Zero-Shot Question GenerationEMNLP2022-04GithubData
InPars: Data Augmentation for Information Retrieval using Large Language ModelsarXiv2022-02GithubData
Generating Datasets with Pretrained Language ModelsEMNLP2021-04Github

Recommendation

TitleVenueDateCodeData
Can Small Language Models be Good Reasoners for Sequential Recommendation?arXiv2024-03
Large Language Model Augmented Narrative Driven RecommendationsarXiv2023-06
Recommendation as Instruction Following: A Large Language Model Empowered Recommendation ApproacharXiv2023-05
ONCE: Boosting Content-based Recommendation with Both Open- and Closed-source Large Language ModelsWSDM2023-05GithubData

Text Generation Evaluation

TitleVenueDateCodeData
Prometheus: Inducing Fine-grained Evaluation Capability in Language ModelsICLR2023-10GithubData
TIGERScore: Towards Building Explainable Metric for All Text Generation TasksarXiv2023-10GithubData
Generative Judge for Evaluating AlignmentICLR2023-10GithubData
PandaLM: An Automatic Evaluation Benchmark for LLM Instruction Tuning OptimizationarXiv2023-06GithubData
INSTRUCTSCORE: Explainable Text Generation Evaluation with Fine-grained FeedbackEMNLP2023-05GithubData

Code

TitleVenueDateCodeData
Magicoder: Source Code Is All You NeedarXiv2023-12GithubData <br> Data
WaveCoder: Widespread And Versatile Enhanced Instruction Tuning with Refined Data GenerationarXiv2023-12
Instruction Fusion: Advancing Prompt Evolution through HybridizationarXiv2023-12
MFTCoder: Boosting Code LLMs with Multitask Fine-TuningarXiv2023-11GithubData <br> Data
LLM-Assisted Code Cleaning For Training Accurate Code GeneratorsarXiv2023-11
Personalised Distillation: Empowering Open-Sourced LLMs with Adaptive Learning for Code GenerationEMNLP2023-10Github
Code Llama: Open Foundation Models for CodearXiv2023-08Github
Distilled GPT for Source Code SummarizationarXiv2023-08GithubData
Textbooks Are All You Need: A Large-Scale Instructional Text Data Set for Language ModelsarXiv2023-06
Code Alpaca: An Instruction-following LLaMA model for code generation-2023-03GithubData

Multi-Modality

TitleVenueDateCodeData
Miko: Multimodal Intention Knowledge Distillation from Large Language Models for Social-Media Commonsense DiscoveryarXiv2024-02
Localizing Visual Commonsense Knowledge in Large Language ModelsNeurIPS2023-12GithubData
To See is to Believe: Prompting GPT-4V for Better Visual Instruction TuningarXiv2023-11GithubData
ILuvUI: Instruction-tuned LangUage-Vision modeling of UIs from Machine ConversationsarXiv2023-10
NExT-GPT: Any-to-Any Multimodal LLMarXiv2023-09GithubData
StableLLaVA: Enhanced Visual Instruction Tuning with Synthesized Image-Dialogue DataarXiv2023-08GithubData
PointLLM: Empowering Large Language Models to Understand Point CloudsarXiv2023-08GithubData
SVIT: Scaling up Visual Instruction TuningarXiv2023-07GithubData
ChatSpot: Bootstrapping Multimodal LLMs via Precise Referring Instruction TuningarXiv2023-07
Shikra: Unleashing Multimodal LLM's Referential Dialogue MagicarXiv2023-06GithubData
Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction TuningICLR2023-06GithubData
Valley: Video Assistant with Large Language model Enhanced abilitYarXiv2023-06GithubData
DetGPT: Detect What You Need via ReasoningEMNLP2023-05Github
Visual Instruction Tuning: A Comprehensive Study of Visual Instruction Tuning for Large Language ModelsNeurIPS2023-04GithubData

Summary Table

<div align="center"> <img src="imgs/table.jpg"><br> <em>Figure: A summary of representative works about skill distillation.</em> </div> <br>

Verticalization Distillation

Law

TitleVenueDateCodeData
Fuzi-2023-08Github
ChatLaw: Open-Source Legal Large Language Model with Integrated External Knowledge BasesarXiv2023-06Github
Lawyer LLaMA Technical ReportarXiv2023-05GithubData

Medical & Healthcare

TitleVenueDateCodeData
HuatuoGPT-II, One-stage Training for Medical Adaption of LLMsarXiv2023-11GithubData
AlpaCare: Instruction-tuned large language models for medical applicationarXiv2023-10GithubData
DISC-MedLLM: Bridging General Large Language Models and Real-World Medical ConsultationarXiv2023-08GithubData
HuatuoGPT: Taming Language Model to Be a DoctorEMNLP2023-05GithubData
DoctorGLM: Fine-tuning your Chinese doctor is not a herculean taskarXiv2023-04GithubData
Huatuo: Tuning LLM with Chinese Medical KnowledgearXiv2023-04Github
MedAlpaca: An Open-Source Collection of Medical Conversational AI Models and Training DataarXiv2023-04GithubData
PMC-LLaMA: Further Finetuning LLaMA on Medical PapersarXiv2023-04GithubData
ChatDoctor: A Medical Chat Model Fine-Tuned on a Large Language Model Meta-AI (LLaMA) Using Medical Domain KnowledgearXiv2023-03Github

Finance

TitleVenueDateCodeData
XuanYuan 2.0: A Large Chinese Financial Chat Model with Hundreds of Billions ParametersCIKM2023-05

Science

TitleVenueDateCodeData
MuseGraph: Graph-oriented Instruction Tuning of Large Language Models for Generic Graph MiningarXiv2024-03
SciGLM: Training Scientific Language Models with Self-Reflective Instruction Annotation and TuningarXiv2024-01Github
AstroLLaMA-Chat: Scaling AstroLLaMA with Conversational and Diverse DatasetsarXiv2024-01
GeoGalactica: A Scientific Large Language Model in GeosciencearXiv2024-01GithubData
InstructMol: Multi-Modal Integration for Building a Versatile and Reliable Molecular Assistant in Drug DiscoveryarXiv2023-11Github
LLM-Prop: Predicting Physical And Electronic Properties Of Crystalline Solids From Their Text DescriptionsarXiv2023-10Github
OceanGPT: A Large Language Model for Ocean Science TasksarXiv2023-10GithubData
MarineGPT: Unlocking Secrets of Ocean to the PublicarXiv2023-10Github
Mammoth: Building Math Generalist Models through Hybrid Instruction TuningarXiv2023-09GithubData
ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem SolvingICLR2023-09Github
DARWIN Series: Domain Specific Large Language Models for Natural SciencearXiv2023-08Github
Wizardmath: Empowering mathematical reasoning for large language models via reinforced evol-instructarXiv2023-08Github
Biomedgpt: Open Multimodal Generative Pre-trained Transformer for BiomedicinearXiv2023-08GithubData
Prot2Text: Multimodal Protein’s Function Generation with GNNs and TransformersNeurIPS2023-07
xTrimoPGLM: Unified 100B-Scale Pre-trained Transformer for Deciphering the Language of ProteinbioRxiv2023-07
GIMLET: A Unified Graph-Text Model for Instruction-Based Molecule Zero-Shot LearningNeurIPS2023-06GithubData
K2: A Foundation Language Model for Geoscience Knowledge Understanding and UtilizationarXiv2023-06Github
Visual Instruction Tuning: A Comprehensive Study of Visual Instruction Tuning for Large Language ModelsNeurIPS2023-04GithubData

Misc.

TitleVenueDateCodeData
OWL: A Large Language Model for IT OperationsarXiv2023-09GithubData
EduChat: A Large-Scale Language Model-based Chatbot System for Intelligent EducationarXiv2023-08GithubData

Encoder-based KD

Note: Our survey mainly focuses on generative LLMs, and thus the encoder-based KD is not included in the survey. However, we are also interested in this topic and continue to update the latest works in this area.

TitleVenueDateCodeData
Masked Latent Semantic Modeling: an Efficient Pre-training Alternative to Masked Language ModelingFindings of ACL2023-08
Better Together: Jointly Using Masked Latent Semantic Modeling and Masked Language Modeling for Sample Efficient Pre-trainingCoNLL2023-08

Citation

If you find this repository helpful, please consider citing the following paper:

@misc{xu2024survey,
      title={A Survey on Knowledge Distillation of Large Language Models}, 
      author={Xiaohan Xu and Ming Li and Chongyang Tao and Tao Shen and Reynold Cheng and Jinyang Li and Can Xu and Dacheng Tao and Tianyi Zhou},
      year={2024},
      eprint={2402.13116},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Star History

Star History Chart