Awesome
Awesome Efficient Diffusion
A curated list of methods that focus on improving the efficiency of diffusion models
Updates
I‘m trying to update this list weekly (every monday morning) from my personal knowledge stack, and collect each conference's proceedings. If you find this repo useful, it would be kind to consider ★staring it or ☛contributing to it.
- [2024/07/08] Reorganizing the catalogs
- [2024/07/09] (ING) Filling in existing surveys
Catalogs
Basics
Resources
Recommended introductory learning materials
- David Saxton's Tutorial on Diffusion
- Song Yang's Post: Generative Modeling by Estimating Gradients of the Data Distribution
- EfficientML Course, MIT, Han Song, The Diffusion Chapter
Diffusion Formulation
formulations of diffusion, development of theory
-
[DPM] "Deep Unsupervised Learning using Nonequilibrium Thermodynamics";
- Early advance of diffusion formulation
- 2015/03 | ICML15 | [Paper]
-
[DDPM] "Denoising Diffusion Probabilistic Models";
- 2020/06 | NeurIPS20 | [Paper]
- The discrete time diffusion
-
[SDE-based Diffusion]
- 2020/11 | ICLR21 | [Paper]
- Continuous time Neural SDE formulation of diffusion
how to introduce control signal
-
[Classifier-based Guidance] "Deep Unsupervised Learning using Nonequilibrium Thermodynamics";
- 2021/05 | Arxiv2105 | [Paper]
- Introduce control signal through classifier
-
[Classifier-free Guidance (CFG)] "Deep Unsupervised Learning using Nonequilibrium Thermodynamics";
- 2022/07 | NeurIPS 2021 Workshop | [Paper]
- Introduce CFG, jointly train a conditional and an unconditional diffusion model, and combine them
-
[LDM] "High-Resolution Image Synthesis with Latent Diffusion Models";
Solvers
-
[DDIM]: "Denoising Diffusion Implicit Models";
- 2020/10 | ICLR21 | [Paper]
- determinstic sampling, skip timesteps
-
[DPMSolver]: "DPM-Solver: A Fast ODE Solver for Diffusion Probabilistic Model Sampling in Around 10 Steps";
- 2022/06 | NeurIPS22 | [Paper]
- utilize the sub-linear property of ODE solving, converge in 10-20 steps
-
[DPMSolver++]: "DPM-Solver++: Fast Solver for Guided Sampling of Diffusion Probabilistic Models";
- 2022/11 | Arxiv | [Paper]
- multi-order ODE, faster convergence
Models
Key Components
Text_encoder
-
[CLIP] "Learning Transferable Visual Models From Natural Language Supervision";
- 2021/03 | Arxiv | [Paper]
- Containing Operations:
- Self-Attention (Cross-Attention)
- FFN (FC)
- LayerNorm (GroupNorm)]
-
[T5] "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer";
- 2019/10 | Arxiv | [Paper]
- Containing Operations:
- Self-Attention (Cross-Attention)
- FFN (FC)
- LayerNorm (GroupNorm)
Summarization of adopted text encoders for large text-to-image models from Kling-AI Technical Report
VAE (for latent-space)
- [VAE] "Tutorial on Variational Autoencoders";
- 2016/06 | Arxiv | [Paper]
- Containing Operations:
- Conv
- DeConv (ConvTransposed, Interpolation)
Diffusion Network
-
[U-Net] "U-Net: Convolutional Networks for Biomedical Image Segmentation";
- 2015/05 | Arxiv | [Paper]
- Containing Operations:
- Conv
- DeConv (ConvTransposed, Interpolation)
- Low-range Shortcut Connection
-
DiT
UpScaler
Open-sourced Models
-
[Imagen]: "Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding”;
- 2022/05 | NeurIPS22 | [Paper]
-
[DeepFlyoid-IF] "DeepFlyod-IF";
- 2022/04 | Arxiv | Stability.AI | [Technical Report] | [Code]
- Larger Language Model (T5 over CLIP) | Pixel-space Diffusion | Diffusion for SR
Closed-source Models
Datasets
Unconditional
Class-Conditioned
-
CIFAR-10:
-
CelebA:
Text-to-image
Evaluation Metrics
-
- Fréchet Inception Distance: evaluting 2 set of image, intermediate feature distance of InceptionNet between reference image and generated image, lower the better
- Kernel Inception Distance
- Inception Score
- limitation: when model trained under large image-caption dataset (LAION-5B), for that the Inception is pre-trained on ImageNet-1K. (StableDiffusion pre-trained set may have overlap)
- The specific Inception model used during computation.
- The image format (not the same if we start from PNGs vs JPGs)
-
- CLIP score: compatibility of image-text pair
- CLIP directional similarity: compatibility of image-text pair
- limitation: The captions tags were crawled from the web, may not align with human description.
-
Other Metrics (Refering from Schuture/Benchmarking-Awesome-Diffusion-Models)
Miscellaneous
Video Generation
Customized Generation
Generate Complex Scene
Algorithm-level
Timestep Reduction
reduce the timestep (the number of u-net inference)
Efficient Solver
-
[DDIM]: "Denoising Diffusion Implicit Models";
- 2021/10 | ICLR21 | [Paper]
- 📊 Key results: 50
100 Steps -> 1020 Steps with moderate performance loss
-
[DPM-Solver]: "DPM-Solver: A Fast ODE Solver for Diffusion Probabilistic Model Sampling in Around 10 Steps";
- 2022/06 | NeurIPS | [Paper]
- 📊 Key results: NFE (number of U-Net forward) = 10 achieves similar performance with DDIM NFE = 100
Timestep Distillation
-
[Catch-Up Distillation]: "Catch-Up Distillation: You Only Need to Train Once for Accelerating Sampling";
- 2023/05 | Arxiv2305 | [Paper]
-
[ReDi]: "ReDi: Efficient Learning-Free Diffusion Inference via Trajectory Retrieval";
- Skip intermediate steps:
- Retrieval: find similar partially generated scheduling in early stage
- 2023/02 | ICML23 | [Paper]
-
[Consistency Model]: "Consistency Models";
- New objective: consistency based
- 2023/03 | Arxiv2303 | [Paper]]
Architecture-level Compression
reduce the diffusion model cost (the repeatedly inference u-net) with pruning / neural architecture search (nas) techniques
Pruning
- [Structural Pruning]: "Structural Pruning for Diffusion Models";
Adaptive Architecture
adaptive skip part of the architecture across timesteps
Token-level Compression
Token Reduction
save computation for different sample condition (noise/prompt/task)
Patched Inference
reduce the processing resolution
-
[PatchDiffusion]: "Patch Diffusion: Faster and More Data-Efficient Training of Diffusion Models";
- 2023/04 | NeurIPS23 | [Paper]
-
[MemEffPatchGen]: "Memory Efficient Diffusion Probabilistic Models via Patch-based Generation";
- 2023/04 | CVPR23W | [Paper]
Model Quantization
quantization & low-bit inference/training
-
[PTQD]: "PTQD: Accurate Post-Training Quantization for Diffusion Models";
- 2023/05 | NeurIPS23 | [Paper]
-
[BiDiffusion]: "Binary Latent Diffusion";
- 2023/04 | Arxiv2304 | [Paper]
Efficient Tuning
-
[DiffFit]: "DiffFit: Unlocking Transferability of Large Diffusion Models via Simple Parameter-Efficient Fine-Tuning";
- 2023/04 | Arxiv2304 | [Paper]
-
[ParamEffTuningSummary]: "A Closer Look at Parameter-Efficient Tuning in Diffusion Models";
- 2023/03 | Arxiv2303 | [Paper]
5.1. Low-Rank
The LORA family
System-level
GPU
Mobile
- [SnapFusion]: "SnapFusion: Text-to-Image Diffusion Model on Mobile Devices within Two Seconds";
- Platform: iPhone 14 Pro, 1.84s
- Model Evolution: 3.8x fewer parameters compared to SD-V1.5
- Step Distillation into 8 steps
- 2023/06 | Arxiv2306 | [Paper]
Related Resources
- heejkoo/Awesome-Diffusion-Models
- awesome-stable-diffusion/awesome-stable-diffusion
- hua1995116/awesome-ai-painting
- PRIV-Creation/Awesome-Diffusion-Personalization
- Schuture/Benchmarking-Awesome-Diffusion-Models
- shogi880/awesome-controllable-stable-diffusion
- Efficient Diffusion Models for Vision: A Survey
- Tracking Papers on Diffusion Models
License
This list is under the Creative Commons licenses License.