Home

Awesome

Awesome Dataset Distillation

Awesome <img src="https://img.shields.io/badge/Contributions-Welcome-278ea5" alt="Contrib"/> <img src="https://img.shields.io/badge/Number%20of%20Items-209-FF6F00" alt="PaperNum"/> Stars Forks

Awesome Dataset Distillation provides the most comprehensive and detailed information on the Dataset Distillation field.

Dataset distillation is the task of synthesizing a small dataset such that models trained on it achieve high performance on the original large dataset. A dataset distillation algorithm takes as input a large real dataset to be distilled (training set), and outputs a small synthetic distilled dataset, which is evaluated via testing models trained on this distilled dataset on a separate real dataset (validation/test set). A good small distilled dataset is not only useful in dataset understanding, but has various applications (e.g., continual learning, privacy, neural architecture search, etc.). This task was first introduced in the paper Dataset Distillation [Tongzhou Wang et al., '18], along with a proposed algorithm using backpropagation through optimization steps. Then the task was first extended to the real-world datasets in the paper Medical Dataset Distillation [Guang Li et al., '19], which also explored the privacy preservation possibilities of dataset distillation. In the paper Dataset Condensation [Bo Zhao et al., '20], gradient matching was first introduced and greatly promoted the development of the dataset distillation field.

In recent years (2022-now), dataset distillation has gained increasing attention in the research community, across many institutes and labs. More papers are now being published each year. These wonderful researches have been constantly improving dataset distillation and exploring its various variants and applications.

This project is curated and maintained by Guang Li, Bo Zhao, and Tongzhou Wang.

<img src="./images/logo.jpg" width="20%"/>

How to submit a pull request?

Latest Updates

Contents

Main

<a name="early-work" />

Early Work

<a name="gradient-objective" />

Gradient/Trajectory Matching Surrogate Objective

<a name="feature-objective" />

Distribution/Feature Matching Surrogate Objective

<a name="kernel" />

Kernel-Based Distillation

<a name="parametrization" />

Distilled Dataset Parametrization

<a name="generative" />

Generative Distillation

<a name="optimization" />

Better Optimization

<a name="understanding" />

Better Understanding

<a name="label" />

Label Distillation

<a name="quant" />

Dataset Quantization

<a name="decouple" />

Decoupled Distillation

<a name="multi" />

Multimodal Distillation

<a name="self" />

Self-Supervised Distillation

<a name="benchmark" />

Benchmark

<a name="survey" />

Survey

<a name="thesis" />

Ph.D. Thesis

<a name="workshop" />

Workshop

<a name="challenge" />

Challenge

Applications

<a name="continual" />

Continual Learning

<a name="privacy" />

Privacy

<a name="medical" />

Medical

<a name="fed" />

Federated Learning

<a name="gnn" />

Graph Neural Network

Survey

Benchmark

No further updates will be made regarding graph distillation topics as sufficient papers and summary projects are already available on the subject

<a name="nas" />

Neural Architecture Search

<a name="fashion" />

Fashion, Art, and Design

<a name="rec" />

Recommender Systems

<a name="blackbox" />

Blackbox Optimization

<a name="robustness" />

Robustness

<a name="fairness" />

Fairness

<a name="text" />

Text

<a name="tabular" />

Tabular

<a name="retrieval" />

Retrieval

<a name="video" />

Video

<a name="domain" />

Domain Adaptation

<a name="super" />

Super Resolution

<a name="time" />

Time Series

<a name="speech" />

Speech

<a name="unlearning" />

Machine Unlearning

<a name="rl" />

Reinforcement Learning

<a name="long" />

Long-Tail

<a name="noisy" />

Learning with Noisy Labels

<a name="detection" />

Object Detection

Media Coverage

Star History

Star History Chart

Citing Awesome Dataset Distillation

If you find this project useful for your research, please use the following BibTeX entry.

@misc{li2022awesome,
  author={Li, Guang and Zhao, Bo and Wang, Tongzhou},
  title={Awesome Dataset Distillation},
  howpublished={\url{https://github.com/Guang000/Awesome-Dataset-Distillation}},
  year={2022}
}

Acknowledgments

We would like to express our heartfelt thanks to Nikolaos Tsilivis, Wei Jin, Yongchao Zhou, Noveen Sachdeva, Can Chen, Guangxiang Zhao, Shiye Lei, Xinchao Wang, Dmitry Medvedev, Seungjae Shin, Jiawei Du, Yidi Jiang, Xindi Wu, Guangyi Liu, Yilun Liu, Kai Wang, Yue Xu, Anjia Cao, Jianyang Gu, Yuanzhen Feng, Peng Sun, Ahmad Sajedi, Zhihao Sui, Ziyu Wang, Haoyang Liu, Eduardo Montesuma, Shengbo Gong, Zheng Zhou, Zhenghao Zhao, Duo Su, Tianhang Zheng, Shijie Ma, Wei Wei, Yantai Yang, Shaobo Wang, Xinhao Zhong, Zhiqiang Shen, Cong Cong, Chun-Yin Huang, Dai Liu, Ruonan Yu, William Holland, and Saksham Singh Kushwaha for their valuable suggestions and contributions.

The Homepage of Awesome Dataset Distillation was designed and maintained by Longzhen Li.