Awesome
Distilling Pretrained Diffusion-Based Generative Models with SiD
This repository contains the code necessary to replicate the findings of our ICML 2024 paper titled "Score identity Distillation: Exponentially Fast Distillation of Pretrained Diffusion Models for One-Step Generation," available at https://arxiv.org/abs/2404.04057. The technique, Score identity Distillation (SiD), is used to distill pretrained EDM diffusion models.
Citations
If you find our work useful or incorporate our findings in your own research, please consider citing our paper:
@inproceedings{zhou2024score,
title={Score identity Distillation: Exponentially Fast Distillation of Pretrained Diffusion Models for One-Step Generation},
author={Mingyuan Zhou and Huangjie Zheng and Zhendong Wang and Mingzhang Yin and Hai Huang},
booktitle={International Conference on Machine Learning},
year={2024}
}
We also have a follow-up paper, available at https://arxiv.org/abs/2406.01561, that extends our SiD methodology to distill Stable Diffusion models for one-step text-to-image generation:
@article{zhou2024long,
title={Long and Short Guidance in Score identity Distillation for One-Step Text-to-Image Generation},
author={Mingyuan Zhou and Zhendong Wang and Huangjie Zheng and Hai Huang},
journal={ArXiv 2406.01561},
url={https://arxiv.org/abs/2406.01561},
year={2024}
}
State-of-the-art Performance
SiD operates as a data-free distillation method but still demonstrates superior performance compared to the teacher EDM model across most datasets, with the notable exception of ImageNet 64x64. It outperforms all previous diffusion distillation approaches—whether one-step or few-step, data-free or training data-dependent—in terms of generation quality. This achievement sets new standards for efficiency and effectiveness in diffusion distillation.
It achieves the following Fréchet Inception Distances (FID):
Dataset | FID |
---|---|
CIFAR10 Unconditional | 1.923 |
CIFAR10 Conditional | 1.710 |
ImageNet 64x64 | 1.524 |
FFHQ 64x64 | 1.550 |
AFHQ-v2 64x64 | 1.628 |
Prerequisites
Before you begin, ensure you have met the following requirements:
- You have installed the latest version of Conda.
- You have a
Windows/Linux/Mac
machine.
Installation
To install the necessary packages and set up the environment, follow these steps:
Clone the Repository
First, clone the repository to your local machine:
git clone https://github.com/mingyuanzhou/SiD.git
cd SiD
Create the Conda Environment
To create the Conda environment with all the required dependencies, run:
conda env create -f environment.yaml
This command will read the environment.yaml
file in the repository, which contains all the necessary package information.
Activate the Environment
After creating the environment, you can activate it by running:
conda activate sid
Prepare the Datasets
Follow the instructions detailed in the EDM codebase to prepare the training datasets. Once prepared, place them into the /data/datasets/
folder:
cifar10-32x32.zip
imagenet-64x64.zip
ffhq-64x64.zip
afhqv2-64x64.zip
Note: Although a training dataset is not necessary for distilling the pretrained EDM model, it is used in our code to calculate evaluation metrics such as FID and Inception Score. Optionally, you can create a dummy dataset and either disable the evaluation code if you wish to run the SID distillation code without these metrics, or provide an npz file of the training dataset if you need to compute these metrics.
Usage
Training
After activating the environment, you can run the scripts or use the modules provided in the repository. Example:
sh run_sid.sh 'cifar10-uncond'
Adjust the --batch-gpu parameter according to your GPU memory limitations. The default setting for cifar10-uncond consumes less than 10 GB of memory per GPU.
Generation
Generate example images:
Generate images and save them as out/*.png and out.npz
- Using a single GPU
python generate_onestep.py --outdir=image_experiment/sid-train-runs/out --seeds=0-63 --batch=64 --network=<network_path>
- Using multiple GPU
torchrun --standalone --nproc_per_node=2 generate_onestep.py --outdir=image_experiment/sid-train-runs/out --seeds=0-999 --batch=64 --network=<network_path>
Evaluations
For ImageNet, there are two different versions of the training data, each associated with its own set of reference statistics. To ensure apples-to-apples comparisons between EDM and its distilled generators with other diffusion models, imagenet-64x64.npz
should be used for computing FID (Fréchet Inception Distance). Conversely, for computing Precision and Recall, VIRTUAL_imagenet64_labeled.npz
should be utilized.
imagenet-64x64.npz
is available at NVIDIA.
VIRTUAL_imagenet64_labeled.npz
is available at OpenAI.
Use sid_generator.py
to generate and save 50,000 images, and compute FID using the saved images
Use a single GPU
python sid_generate.py --outdir=image_experiment/out --seeds=0-49999 --batch=128 --network='https://huggingface.co/UT-Austin-PML/SiD/resolve/main/cifar10-uncond/alpha1.2/network-snapshot-1.200000-403968.pkl' --ref=https://nvlabs-fi-cdn.nvidia.com/edm/fid-refs/cifar10-32x32.npz
Use four GPUs
torchrun --standalone --nproc_per_node=4 sid_generate.py --outdir=out --seeds=0-49999 --batch=128 --network='https://huggingface.co/UT-Austin-PML/SiD/resolve/main/imagenet64/alpha1.2/network-snapshot-1.200000-939176.pkl' --ref=https://nvlabs-fi-cdn.nvidia.com/edm/fid-refs/imagenet-64x64.npz
Use sid_metrics.py
to perform 10 random trials, each trial computes the metrics using 50,000 randomly generated images
Compute FID and/or IS
torchrun --standalone --nproc_per_node=4 sid_metrics.py --cond=False --metrics='fid50k_full,is50k' --network='https://huggingface.co/UT-Austin-PML/SiD/resolve/main/cifar10-uncond/alpha1.2/network-snapshot-1.200000-403968.pkl' --data='/data/datasets/cifar10-32x32.zip'
torchrun --standalone --nproc_per_node=4 sid_metrics.py --cond=True --metrics='fid50k_full' --network='https://huggingface.co/UT-Austin-PML/SiD/resolve/main/imagenet64/alpha1.2/network-snapshot-1.200000-939176.pkl' --data='/data/datasets/imagenet-64x64.zip' --data_stat='https://nvlabs-fi-cdn.nvidia.com/edm/fid-refs/imagenet-64x64.npz'
Compute Precision and Recall for ImageNet
torchrun --standalone --nproc_per_node=4 sid_metrics.py --cond=True --metrics='pr50k3_full' --network='https://huggingface.co/UT-Austin-PML/SiD/resolve/main/imagenet64/alpha1.2/network-snapshot-1.200000-939176.pkl' --data='/data/datasets/imagenet-64x64.zip' --data_stat='https://openaipublic.blob.core.windows.net/diffusion/jul-2021/ref_batches/imagenet/64/VIRTUAL_imagenet64_labeled.npz'
Checkpoints of one-step generators produced by SiD
The one-step generators produced by SiD are provided in huggingface/UT-Austin-PML/SiD
Acknowledgements
We extend our gratitude to the authors of the EDM paper for sharing their code, which served as the foundational framework for developing SiD. The repository can be found here: NVlabs/edm.
Additionally, we are thankful to the authors of the Diff Instruct paper for making their code available. Their contributions have been instrumental in integrating the evaluation pipeline into our training iterations. Their repository is accessible here: pkulwj1994/diff_instruct.
Code Contributions
- Mingyuan Zhou: Led the project and wrote the majority of the code.
- Huangjie Zheng, Zhendong Wang, Hai Huang: Worked closely with Mingyuan Zhou, co-developing essential components and writing various subfunctions.
Contributing to the Project
To contribute to this project, follow these steps:
- Fork this repository.
- Create a new branch:
git checkout -b <branch_name>
. - Make your changes and commit them:
git commit -m '<commit_message>'
- Push to the original branch:
git push origin <project_name>/<location>
- Create the pull request.
Alternatively, see the GitHub documentation on creating a pull request.
Contact
If you want to contact me you can reach me at mingyuan.zhou@mccombs.utexas.edu
.
License
This project uses the following license: Apache-2.0 license.