

Emphasizing Discriminative Features for Dataset Distillation in Complex Scenarios

Project Page | Paper

Kai Wang*, Zekai Li*, Zhi-Qi Cheng, Samir Khaki, Ahmad Sajedi, Ramakrishna Vedantam, Konstantinos N Plataniotis, Alexander Hauptmann, Yang You

National University of Singapore, Carnegie Mellon University, University of Toronto, Independent Researcher

*equal contribution



In this work, we propose to emphasize discriminative features for dataset distillation in the complex scenario, i.e. images in complex scenarios are characterized by significant variations in object sizes and the presence of a large amount of class-irrelevant information.

EDF achieves this from supervision and data perspectives via a Common Pattern Dropout and a Discriminative Area Enhancement module, respectively:


Screenshot 2024-10-13 at 9.09.59 PM

EDF demonstrates prominent peroformances on several ImageNet-1K subsets compared with various baselines.

Visualization of Synthetic Images


Comp-DD Benchmark

Please navigate to comp-dd

Getting Started

Create environment as follows:

conda env create -f environment.yml
conda activate dd

Train Expert Trajectories

To train expert trajectories, you can run

bash scripts/buffer.sh

In the script, we demo with the "ImageNette" subset. Change the argument --subset to other subsets when training expert trajectories on them.

For the list of available subsets, please refer to utils/utils_gsam.


To perform distillation, please run:

bash scripts/distill_in1k_ipc1.sh # for ipc1
bash scripts/distill_in1k_ipc10.sh # for ipc10
bash scripts/distill_in1k_ipc50.sh # for ipc50

Similarly, the sample scripts provided use "ImageNette" for demo. You can change the subset easily as follows:

cd distill
CFG="../configs/ImageNet/SUBSET/ConvIN/IPC1.yaml" # replace the SUBSET with the one you want to distill
python3 edf_distill.py --cfg $CFG

Hyper-parameters in each config file in configs are the ones used in EDF main experiments. Feel free to play around with other hyper-parameters for distillation by modifying the corresponding config file.


By default, we perform evaluation along after every 500/1000 iterations. If you want to evaluate distilled explicitly, you can run

cd distill
python3 evaluation.py --lr_dir=path_to_lr --data_dir=path_to_images --label_dir=path_to_labels

In our paper, we also use knowledge distillation to ensure a fair comparison against methods that integrate knowledge distillation during evaluation. For detailed implementation, please refer to the official codebase of SRe2L and RDED.


Our code is built on PAD