Awesome
Diffusion Models: A Comprehensive Survey of Methods and Applications
This repo is constructed for collecting and categorizing papers about diffusion models according to our survey paper——Diffusion Models: A Comprehensive Survey of Methods and Applications, which has been accepted by the journal ACM Computing Surveys. Considering the fast development of this field, we will continue to update both arxiv paper and this repo.
Overview
<div aligncenter><img width="900" alt="image" src="https://user-images.githubusercontent.com/62683396/227244860-3608bf02-b2af-4c00-8e87-6221a59a4c42.png">Catalogue
Algorithm Taxonomy
Sampling-Acceleration Enhancement
Likelihood-Maximization Enhancement
Data with Special Structures
Diffusion with (Multimodal) LLM
Diffusion with DPO/RLHF
Application Taxonomy
- Computer Vision
- Natural Language Processing
- Temporal Data Modeling
- Multi-Modal Learning
- Robust Learning
- Molecular Graph Modeling
- Material Design
- Medical Image Reconstruction
Connections with Other Generative Models
- Variational Autoencoder
- Generative Adversarial Network
- Normalizing Flow
- Autoregressive Models
- Energy-Based Models
Algorithm Taxonomy
<p id="1.1"></p >1. Efficient Sampling
<p id="1.1.1"></p >1.1 Learning-Free Sampling
<p id="1.1.1.1"></p >1.1.1 SDE Solver
Score-Based Generative Modeling through Stochastic Differential Equations
Adversarial score matching and improved sampling for image generation
Score-Based Generative Modeling with Critically-Damped Langevin Diffusion
Gotta Go Fast When Generating Data with Score-Based Models
Elucidating the Design Space of Diffusion-Based Generative Models
Generative modeling by estimating gradients of the data distribution
Structure-Guided Adversarial Training of Diffusion Models
<p id="1.1.1.2"></p >1.1.2 ODE Solver
Denoising Diffusion Implicit Models
Improving Diffusion-Based Image Synthesis with Context Prediction
gDDIM: Generalized denoising diffusion implicit models
Elucidating the Design Space of Diffusion-Based Generative Models
DPM-Solver: A Fast ODE Solver for Diffusion Probabilistic Model Sampling in Around 10 Step
Pseudo Numerical Methods for Diffusion Models on Manifolds
Fast Sampling of Diffusion Models with Exponential Integrator
Poisson flow generative models
Improving Diffusion-Based Image Synthesis with Context Prediction
Cross-Modal Contextualized Diffusion Models for Text-Guided Visual Generation and Editing
Structure-Guided Adversarial Training of Diffusion Models
Consistency Flow Matching: Defining Straight Flows with Velocity Consistency
<p id="1.1.2"></p >1.2 Learning-Based Sampling
<p id="1.1.2.1"></p >1.2.1 Optimized Discretization
Learning to Efficiently Sample from Diffusion Probabilistic Models
GENIE: Higher-Order Denoising Diffusion Solvers
Learning fast samplers for diffusion models by differentiating through sample quality
<p id="1.1.2.2"></p >1.2.2 Knowledge Distillation
Progressive Distillation for Fast Sampling of Diffusion Models
Knowledge Distillation in Iterative Generative Models for Improved Sampling Speed
<p id="1.1.2.3"></p >1.2.3 Truncated Diffusion
Accelerating Diffusion Models via Early Stop of the Diffusion Process
Truncated Diffusion Probabilistic Models
<p id="1.2"></p >2. Improved Likelihood
<p id="1.2.1"></p >2.1. Noise Schedule Optimization
Cross-Modal Contextualized Diffusion Models for Text-Guided Visual Generation and Editing
Improved denoising diffusion probabilistic models
<p id="1.2.2"></p >2.2. Reverse Variance Learning
Analytic-DPM: an Analytic Estimate of the Optimal Reverse Variance in Diffusion Probabilistic Models
Improved denoising diffusion probabilistic models
Stable Target Field for Reduced Variance Score Estimation in Diffusion Models
<p id="1.2.3"></p >2.3. Exact Likelihood Computation
Structure-Guided Adversarial Training of Diffusion Models
Score-Based Generative Modeling through Stochastic Differential Equations
Maximum likelihood training of score-based diffusion models
A variational perspective on diffusion-based generative models and score matching
Score-Based Generative Modeling through Stochastic Differential Equations
Maximum Likelihood Training for Score-based Diffusion ODEs by High Order Denoising Score Matching
Maximum Likelihood Training of Implicit Nonlinear Diffusion Models
Improving Diffusion-Based Image Synthesis with Context Prediction
<p id="1.3"></p >3. Data with Special Structures
<p id="1.3.1"></p >3.1. Data with Manifold Structures
<p id="1.3.1.1"></p >3.1.1 Known Manifolds
Riemannian Score-Based Generative Modeling
<p id="1.3.1.2"></p >3.1.2 Learned Manifolds
Score-based generative modeling in latent space
Diffusion priors in variational autoencoders
Hierarchical text-conditional image generation with clip latents
High-resolution image synthesis with latent diffusion models
Improving Diffusion-Based Image Synthesis with Context Prediction
<p id="1.3.2"></p >3.2. Data with Invariant Structures
GeoDiff: A Geometric Diffusion Model for Molecular Conformation Generation
Permutation invariant graph generation via score-based generative modeling
Score-based Generative Modeling of Graphs via the System of Stochastic Differential Equations
DiGress: Discrete Denoising diffusion for graph generation
Learning gradient fields for molecular conformation generation
Graphgdp: Generative diffusion processes for permutation invariant graph generation
SwinGNN: Rethinking Permutation Invariance in Diffusion Models for Graph Generation
Protein-Ligand Interaction Prior for Binding-aware 3D Molecule Diffusion Models
Graphusion: Latent Diffusion for Graph Generation
<p id="1.3.3"></p >3.3 Discrete Data
Vector quantized diffusion model for text-to-image synthesis
Structured Denoising Diffusion Models in Discrete State-Spaces
Vector Quantized Diffusion Model with CodeUnet for Text-to-Sign Pose Sequences Generation
Deep Unsupervised Learning using Non equilibrium Thermodynamics.
A Continuous Time Framework for Discrete Denoising Models
<p id="1.4"></p >4. Diffusion with (Multimodal) LLM
<p id="1.4.1"></p >4.1. Simple Combination
Videodirectorgpt: Consistent multi-scene video generation via llm-guided planning
<p id="1.4.2"></p >4.2. Deep Collaboration
Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs
VideoTetris: Towards Compositional Text-To-Video Generation
<p id="1.5"></p >4. Diffusion with DPO/RLHF
Diffusion Model Alignment Using Direct Preference Optimization
ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation
<p id="2"></p>Application Taxonomy
<p id="2.1"></p>1. Computer Vision
<p id="2.1.1"></p >- Conditional Image Generation (Image Super Resolution, Inpainting, Translation, Manipulation)
- Improving Diffusion-Based Image Synthesis with Context Prediction
- SRDiff: Single Image Super-Resolution with Diffusion Probabilistic Models
- Image Super-Resolution via Iterative Refinement
- High-Resolution Image Synthesis with Latent Diffusion Models
- Repaint: Inpainting using denoising diffusion probabilistic models.
- Palette: Image-to-image diffusion models.
- Generative Visual Prompt: Unifying Distributional Control of Pre-Trained Generative Models
- Cascaded Diffusion Models for High Fidelity Image Generation.
- Conditional image generation with score-based diffusion models
- Unsupervised Medical Image Translation with Adversarial Diffusion Models
- Score-based diffusion models for accelerated MRI
- Solving Inverse Problems in Medical Imaging with Score-Based Generative Models
- MR Image Denoising and Super-Resolution Using Regularized Reverse Diffusion
- Sdedit: Guided image synthesis and editing with stochastic differential equations
- Soft diffusion: Score matching for general corruptions
- Diffusion-Based Scene Graph to Image Generation with Masked Contrastive Pre-Training
- ControlNet: Adding Conditional Control to Text-to-Image Diffusion Models
- Image Restoration with Mean-Reverting Stochastic Differential Equations
- SpaText: Spatio-Textual Representation for Controllable Image Generation
- Break-A-Scene: Extracting Multiple Concepts from a Single Image
- Improving Diffusion-Based Image Synthesis with Context Prediction
- Cross-Modal Contextualized Diffusion Models for Text-Guided Visual Generation and Editing
- RealCompo: Dynamic Equilibrium between Realism and Compositionality Improves Text-to-Image Diffusion Models
- Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs
- EditWorld: Simulating World Dynamics for Instruction-Following Image Editing
- IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation
- Consistency Flow Matching: Defining Straight Flows with Velocity Consistency
- Rectified Diffusion: Straightness Is Not Your Need in Rectified Flow
- Semantic Segmentation
- Video Generation
- Flexible Diffusion Modeling of Long Videos
- Video diffusion models
- Diffusion probabilistic modeling for video generation
- MotionDiffuse: Text-Driven Human Motion Generation with Diffusion Model.
- Cross-Modal Contextualized Diffusion Models for Text-Guided Visual Generation and Editing
- Stable video diffusion: Scaling latent video diffusion models to large datasets
- I2vgen-xl: High-quality image-to-video synthesis via cascaded diffusion models
- Lumiere: A space-time diffusion model for video generation
- VideoTetris: Towards Compositional Text-To-Video Generation
- 3D Generation
- 3d shape generation and completion through point-voxel diffusion
- Diffusion probabilistic models for 3d point cloud generation
- A Conditional Point Diffusion-Refinement Paradigm for 3D Point Cloud Completion
- Let us Build Bridges: Understanding and Extending Diffusion Generative Models.
- LION: Latent Point Diffusion Models for 3D Shape Generation
- Make-It-3D: High-Fidelity 3D Creation from A Single Image with Diffusion Prior
- Score Jacobian Chaining: Lifting Pretrained 2D Diffusion Models for 3D Generation
- RenderDiffusion: Image Diffusion for 3D Reconstruction, Inpainting and Generation
- HOLODIFFUSION: Training a 3D Diffusion Model using 2D Images
- Latent-NeRF for Shape-Guided Generation of 3D Shapes and Textures
- DiffRF: Rendering-Guided 3D Radiance Field Diffusion
- DiffusioNeRF: Regularizing Neural Radiance Fields with Denoising Diffusion Models
- 3D Neural Field Generation using Triplane Diffusion
- Semantic Score Distillation Sampling for Compositional Text-to-3D Generation
- Trans4D: Realistic Geometry-Aware Transition for Compositional Text-to-4D Synthesis
- Anomaly Detection
- Object Detection
2. Natural Language Processing
- Structured denoising diffusion models in discrete state-spaces
- Diffusion-LM Improves Controllable Text Generation.
- Analog Bits: Generating Discrete Data using Diffusion Models with Self-Conditioning
- DiffuSeq: Sequence to Sequence Text Generation with Diffusion Models
3. Temporal Data Modeling
<p id="2.3.1"></p >- Time Series Imputation
- Time Series Forecasting
- Waveform Signal Processing
4. Multi-Modal Learning
<p id="2.4.1"></p >- Text-to-Image Generation
- Improving Diffusion-Based Image Synthesis with Context Prediction
- Blended diffusion for text-driven editing of natural images
- Hierarchical Text-Conditional Image Generation with CLIP Latents
- Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding
- GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models
- Vector quantized diffusion model for text-to-image synthesis.
- Frido: Feature Pyramid Diffusion for Complex Image Synthesis.
- DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation
- Imagic: Text-Based Real Image Editing with Diffusion Models
- UniTune: Text-Driven Image Editing by Fine Tuning an Image Generation Model on a Single Image
- DiffusionCLIP: Text-Guided Diffusion Models for Robust Image Manipulation
- One Transformer Fits All Distributions in Multi-Modal Diffusion at Scale
- TextDiffuser: Diffusion Models as Text Painters
- Improving Diffusion-Based Image Synthesis with Context Prediction
- Cross-Modal Contextualized Diffusion Models for Text-Guided Visual Generation and Editing
- RealCompo: Dynamic Equilibrium between Realism and Compositionality Improves Text-to-Image Diffusion Models
- Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs
- EditWorld: Simulating World Dynamics for Instruction-Following Image Editing
- IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation
- Consistency Flow Matching: Defining Straight Flows with Velocity Consistency
- Rectified Diffusion: Straightness Is Not Your Need in Rectified Flow
- Text-to-3D Generation
- Magic3D: High-Resolution Text-to-3D Content Creation
- DreamFusion: Text-to-3D using 2D Diffusion
- Make-It-3D: High-Fidelity 3D Creation from A Single Image with Diffusion Prior
- Shap·E: Generating Conditional 3D Implicit Functions
- Fantasia3D: Disentangling Geometry and Appearance for High-quality Text-to-3D Content Creation
- Dream3D: Zero-Shot Text-to-3D Synthesis Using 3D Shape Prior and Text-to-Image Diffusion Models
- ProlificDreamer: High-Fidelity and Diverse Text-to-3D Generation with Variational Score Distillation
- LucidDreamer: Domain-free Generation of 3D Gaussian Splatting Scenes
- GaussianDreamer: Fast Generation from Text to 3D Gaussians by Bridging 2D and 3D Diffusion Models
- IPDreamer: Appearance-Controllable 3D Object Generation with Complex Image Prompts
- Semantic Score Distillation Sampling for Compositional Text-to-3D Generation
- Trans4D: Realistic Geometry-Aware Transition for Compositional Text-to-4D Synthesis
- Scene Graph/Layout to Image Generation
- Diffusion-Based Scene Graph to Image Generation with Masked Contrastive Pre-Training
- LayoutDiffusion: Controllable Diffusion Model for Layout-to-image Generation
- LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models
- RealCompo: Dynamic Equilibrium between Realism and Compositionality Improves Text-to-Image Diffusion Models
- Text-to-Audio Generation
- Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech
- Guided-TTS 2: A Diffusion Model for High-quality Adaptive Text-to-Speech with Untranscribed Data
- Diffsound: Discrete Diffusion Model for Text-to-sound Generation
- ItôTTS and ItôWave: Linear Stochastic Differential Equation Is All You Need For Audio Generation
- Zero-Shot Voice Conditioning for Denoising Diffusion TTS Models
- EdiTTS: Score-based Editing for Controllable Text-to-Speech.
- ProDiff: Progressive Fast Diffusion Model For High-Quality Text-to-Speech.
- Text-to-Audio Generation using Instruction-Tuned LLM and Latent Diffusion Model
- Text-to-Motion Generation
- Text-to-Video Generation/Editting
- Make-a-video: Text-to-video generation without text-video data
- Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation
- FateZero: Fusing Attentions for Zero-shot Text-based Video Editing
- Imagen video: High definition video generation with diffusion models
- Conditional Image-to-Video Generation with Latent Flow Diffusion Models
- Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Generators
- Zero-Shot Video Editing Using Off-The-Shelf Image Diffusion Models
- Follow Your Pose: Pose-Guided Text-to-Video Generation using Pose-Free Videos
- Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Generators
- ControlVideo: Training-free Controllable Text-to-Video Generation
- MotionDirector: Motion Customization of Text-to-Video Diffusion Models
- Cross-Modal Contextualized Diffusion Models for Text-Guided Visual Generation and Editing
- Stable video diffusion: Scaling latent video diffusion models to large datasets
- I2vgen-xl: High-quality image-to-video synthesis via cascaded diffusion models
- Lumiere: A space-time diffusion model for video generation
- Videocrafter1: Open diffusion models for high-quality video generation
- VideoTetris: Towards Compositional Text-To-Video Generation
5. Robust Learning
<p id="2.5.1"></p >- Data Purification
- Diffusion Models for Adversarial Purification
- Adversarial purification with score-based generative models
- Threat Model-Agnostic Adversarial Defense using Diffusion Models
- Guided Diffusion Model for Adversarial Purification
- Guided Diffusion Model for Adversarial Purification from Random Noise
- PointDP: Diffusion-driven Purification against Adversarial Attacks on 3D Point Cloud Recognition.
- Generating Synthetic Data for Robust Learning
6. Molecular Graph Modeling
- Torsional Diffusion for Molecular Conformer Generation.
- Equivariant Diffusion for Molecule Generation in 3D
- Protein Structure and Sequence Generation with Equivariant Denoising Diffusion Probabilistic Models
- GeoDiff: A Geometric Diffusion Model for Molecular Conformation Generation
- Diffusion probabilistic modeling of protein backbones in 3D for the motif-scaffolding problem
- Diffusion-based Molecule Generation with Informative Prior Bridge
- Learning gradient fields for molecular conformation generation
- Predicting molecular conformation via dynamic graph score matching.
- DiffDock: Diffusion Steps, Twists, and Turns for Molecular Docking
- 3D Equivariant Diffusion for Target-Aware Molecule Generation and Affinity Prediction
- Learning Joint 2D & 3D Diffusion Models for Complete Molecule Generation
- Graphusion: Latent Diffusion for Graph Generation
- Binding-Adaptive Diffusion Models for Structure-Based Drug Design
- Protein-Ligand Interaction Prior for Binding-aware 3D Molecule Diffusion Models
- Interaction-based Retrieval-augmented Diffusion Models for Protein-specific 3D Molecule Generation
7. Material Design
- Crystal Diffusion Variational Autoencoder for Periodic Material Generation
- Antigen-specific antibody design and optimization with diffusion-based generative models
8. Medical Image Reconstruction
- Solving Inverse Problems in Medical Imaging with Score-Based Generative Models
- MR Image Denoising and Super-Resolution Using Regularized Reverse Diffusion
- Score-based diffusion models for accelerated MRI
- Towards performant and reliable undersampled MR reconstruction via diffusion model sampling
- Come-closer-diffuse-faster: Accelerating conditional diffusion models for inverse problems through stochastic contraction
Connections with Other Generative Models
<p id="3.1"></p>1. Variational Autoencoder
- Understanding Diffusion Models: A Unified Perspective
- A variational perspective on diffusion-based generative models and score matching
- Score-based generative modeling in latent space
- Improving Diffusion-Based Image Synthesis with Context Prediction
2. Generative Adversarial Network
- Diffusion-GAN: Training GANs with Diffusion.
- Tackling the generative learning trilemma with denoising diffusion gans
- Structure-Guided Adversarial Training of Diffusion Models
3. Normalizing Flow
- Diffusion Normalizing Flow
- Interpreting diffusion score matching using normalizing flow
- Maximum Likelihood Training of Implicit Nonlinear Diffusion Models
- Consistency Flow Matching: Defining Straight Flows with Velocity Consistency
- Rectified Diffusion: Straightness Is Not Your Need in Rectified Flow
4. Autoregressive Models
- Autoregressive Diffusion Models.
- Autoregressive Denoising Diffusion Models for Multivariate Probabilistic Time Series Forecasting.
5. Energy-Based Models
- Learning Energy-Based Models by Diffusion Recovery Likelihood
- Latent Diffusion Energy-Based Model for Interpretable Text Modeling
Citing
If you find this work useful, please cite our paper:
@article{yang2023diffusurvey,
title={Diffusion models: A comprehensive survey of methods and applications},
author={Yang, Ling and Zhang, Zhilong and Song, Yang and Hong, Shenda and Xu, Runsheng and Zhao, Yue and Zhang, Wentao and Cui, Bin and Yang, Ming-Hsuan},
journal={ACM Computing Surveys},
volume={56},
number={4},
pages={1--39},
year={2023},
publisher={ACM New York, NY, USA}
}