Home

Awesome

<p align="center"> <img src="Fig/logo.png" width="100%" class="center" alt="pipeline"/> </p>

Project Page | Paper | Distilled Dataset

This repository contains the code and implementation for the paper "BACON: Bayesian Optimal Condensation Framework for Dataset Distillation".

👨‍💻 Authors

📧 For inquiries, please reach out via email: zhengzhou@buaa.edu.cn. Feel free to ask any questions!

🔍 Overview

<p align="center"> <img src="./Fig/overview.png" width=100% height=55.2% class="center"> <figcaption><strong>Figure 1:</strong> Comparison of BACON and existing DD methods: (a) Traditional methods align gradients and distributions on original and synthetic datasets. (b) BACON transforms DD into a Bayesian optimization task, generating synthetic images using likelihood and prior probabilities.</figcaption> </p>

Abstract Dataset Distillation (DD) reduces dataset size while maintaining test set performance, helping to cut storage and training costs. Current DD methods struggle with large datasets and lack a solid theoretical foundation. To address this, we introduce the <u>BA</u>yesian Optimal <u>CON</u>densation Framework (<u>BACON</u>), the first Bayesian approach to DD. BACON formulates DD as a minimization problem in Bayesian joint distributions and derives a numerically feasible lower bound. Our experiments show that BACON outperforms state-of-the-art methods, with significant accuracy improvements on CIFAR-10 and TinyImageNet. BACON seamlessly integrates with existing systems and boosts DD performance. Code and distilled datasets are available at BACON.

🚀 Contributions

<p align="center"> <img src="./Fig/method.png" width=100% height=55.2% class="center"> <figcaption><strong>Figure 2:</strong> Illustration of BACON: The neural network outputs a distribution from both synthetic and real datasets. BACON formulates this distribution as a Bayesian optimal condensation risk function and derives its optimal solution using Bayesian principles.</figcaption> </p>

📈 Experimental Results

The distilled datasets are available at Distilled Dataset.

<!-- (https://drive.google.com/drive/folders/1hZCowM21nfSOkRtm8VuK1lEpP7Bd1jCq?usp=sharing). -->

Comparison to the State-of-the-art Methods

MethodMNISTFashion-MNISTSVHNCIFAR-10CIFAR-100Tiny-ImageNet
DM94.8--6343.6-
IDM97.0184.0387.567.550-
BACON98.0185.5289.170.0652.29-
MethodMNISTFashion-MNISTSVHNCIFAR-10CIFAR-100Tiny-ImageNet
DM97.3--48.929.712.9
IDM96.2682.5382.9558.645.121.9
BACON97.384.2384.6462.0646.1525
MethodMNISTFashion-MNISTSVHNCIFAR-10CIFAR-100Tiny-ImageNet
DM89.2--2611.43.9
IDM93.8278.2369.4545.6020.110.1
BACON94.1578.4869.4445.6223.6810.2

Visulizations

<!-- ![image samples](./Fig/visulization.png) -->

image samples image samples image samples

<!-- ![image samples](./Fig/cifar-100.png) -->

🚀 Getting Started

Step 1

Step 2

<!-- - at [Dataset](https://drive.google.com/drive/folders/1hZCowM21nfSOkRtm8VuK1lEpP7Bd1jCq?usp=sharing). -->

Step 3

📁 Directory Structure

🛠️ Command for Reproducing Experiment Results and Evaluation

🙏 Acknowledge

We gratefully acknowledge the contributors of DC-bench and IDM, as our code builds upon their work (DC-bench and IDM).

📚 Citation

@article{zhou2024bacon,
  title={BACON: Bayesian Optimal Condensation Framework for Dataset Distillation},
  author={Zhou, Zheng and Zhao, Hongbo and Cheng, Guangliang and Li, Xiangtai and Lyu, Shuchang and Feng, Wenquan and Zhao, Qi},
  journal={arXiv preprint arXiv:2406.01112},
  year={2024}
}

🌟 Star History

Star History Chart