Awesome
Diffusion Models in Vision: A Survey (accepted at IEEE TPAMI 2023)
Denoising diffusion models represent a recent emerging topic in computer vision, demonstrating remarkable results in the area of generative modeling. A diffusion model is a deep generative model that is based on two stages, a forward diffusion stage and a reverse diffusion stage. In the forward diffusion stage, the input data is gradually perturbed over several steps by adding Gaussian noise. In the reverse stage, a model is tasked at recovering the original input data by learning to gradually reverse the diffusion process, step by step. Diffusion models are widely appreciated for the quality and diversity of the generated samples, despite their known computational burdens, i.e. low speeds due to the high number of steps involved during sampling. This repository categorizes the papers about diffusion models, applied in computer vision, according to their target task. The classifcation is based on our survey Diffusion Models in Vision: A Survey, which was accepted for publication in IEEE TPAMI.
Summary
- Unconditional Generation
- Conditional Generation
- Text-to-Image generation
- Super-Resolution
- Image Editing
- Region Image Editing
- Inpainting
- Image-to-Image Translation
- Image Segmentation
- Multi-Task
- Medical Image-to-Image Translation
- Medical Image Generation
- Medical Image Segmentation
- Medical Image Anomaly Detection
- Video Generation
- Few-Shot Image Generation
- Counterfactual Explanations and Estimations
- Image Restoration
- Image Registration
- Adversarial Purification
- Semantic Image Generation
- Shape Generation and Completion
- Classification
- Point Cloud Generation
- Theoretical
- Graphs
- Deblurring
- Face Morphing Attack Detection
- Trajectory/Motion Prediction
- Attacks
- Study on data memorization
- Out-of-Distribution Detection
- Image-to-Text Generation
- Quantization
- Image/Video anomaly detection
- Video-to-Speech
- Pose estimation
- Graphic layout generation
- Image watermarking
- Video Editing
- Information retrieval from video
- Object detection
Content
Unconditional Generation <a name="1"></a>
- Deep unsupervised learning using non-equilibrium thermodynamics
- Denoising diffusion probabilistic models
- Improved techniques for training score-based generative models
- Adversarial score matching and improved sampling for image generation
- Maximum likelihood training of score-based diffusion models
- D2C: Diffusion-Decoding Models for Few-Shot Conditional Generation
- Diffusion Normalizing Flow
- Diffusion Schrodinger bridge with applications to score-based generative modeling
- Structured denoising diffusion models in discrete state-spaces
- Score-based generative modeling in latent space
- Improved denoising diffusion probabilistic models
- Denoising Diffusion Implicit Models
- Non-Gaussian denoising diffusion models
- Bilateral denoising diffusion models
- Unleashing Transformers: Parallel Token Prediction with Discrete Absorbing Diffusion for Fast High-Resolution Image Generation from Vector-Quantized Codes
- Noise estimation for generative diffusion models
- Gotta go fast when generating data with score-based models
- Learning to efficiently sample from diffusion probabilistic models
- Deep generative learning via Schrodinger bridge
- VAEs meet Diffusion Models: Efficient and High-Fidelity Generation
- Variational diffusion models
- Score-based generative modeling with critically-damped Langevin diffusion
- Tackling the generative learning trilemma with Denoising Diffusion GANs
- Heavy-tailed denoising score matching
- Analytic-DPM: an Analytic Estimate of the Optimal Reverse Variance in Diffusion Probabilistic Models
- Learning Fast Samplers for Diffusion Models by Differentiating Through Sample Quality
- Truncated Diffusion Probabilistic Models
- Subspace Diffusion Generative Models
- Maximum Likelihood Training of Implicit Nonlinear Diffusion Models
- On Analyzing Generative and Denoising Capabilities of Diffusion-based Deep Generative Models
- Diffusion-GAN: Training GANs with Diffusion
- Accelerating Score-based Generative Models for High-Resolution Image Synthesis
- Soft Diffusion: Score Matching for General Corruptions
- Post-Training Quantization on Diffusion Models
- Lookahead Diffusion Probabilistic Models for Refining Mean Estimation
- Wavelet Diffusion Models are fast and scalable Image Generators
- All are Worth Words: A ViT Backbone for Diffusion Models
- Diffusion Probabilistic Model Made Slim
- Masked Diffusion Transformer is a Strong Image Synthesizer
- DiffFit: Unlocking Transferability of Large Diffusion Models via Simple Parameter-efficient Fine-Tuning
- simple diffusion: End-to-end diffusion for high resolution images
- Refining Generative Process with Discriminator Guidance in Score-based Diffusion Models
Conditional Generation <a name="2"></a>
- Diffusion models beat gans on image synthesis
- Classifier-Free Diffusion Guidance
- On Fast Sampling of Diffusion Probabilistic Models
- DiffuseVAE: Efficient, Controllable and High-Fidelity Generation from Low-Dimensional Latents
- Pseudo Numerical Methods for Diffusion Models on Manifolds
- Cascaded Diffusion Models for High Fidelity Image Generation
- High Fidelity Visualization of What Your Self-Supervised Representation Knows About
- Itô-Taylor Sampling Scheme for Denoising Diffusion Probabilistic Models using Ideal Derivatives
- {Dynamic Dual-Output Diffusion Models
- Generating High Fidelity Data from Low-density Regions using Diffusion Models
- Perception Prioritized Training of Diffusion Models
- Elucidating the Design Space of Diffusion-Based Generative Models
- Progressive distillation for fast sampling of diffusion models
- Denoising Likelihood Score Matching for Conditional Score-based Data Generation
- On Conditioning the Input Noise for Controlled Image Generation with Diffusion Models
- A Continuous Time Framework for Discrete Denoising Models
- DPM-Solver: A Fast ODE Solver for Diffusion Probabilistic Model Sampling in Around 10 Steps
- Compositional Visual Generation with Composable Diffusion Models
- TryOnDiffusion: A Tale of Two UNets
- High-Fidelity Guided Image Synthesis with Latent Diffusion Models
- Unite and Conquer: Plug & Play Multi-Modal Synthesis using Diffusion Models
- Towards Practical Plug-and-Play Diffusion Models
- Inversion-based Style Transfer with Diffusion Models
- Conditional Text Image Generation with Diffusion Models
- Generative Diffusion Prior for Unified Image Restoration and Enhancement
- DCFace: Synthetic Face Generation With Dual Condition Diffusion Model
- Controllable Light Diffusion for Portraits
- LayoutDiffusion: Controllable Diffusion Model for Layout-to-image Generation
- Self-Guided Diffusion Models
- AdvDiffuser: Natural Adversarial Example Synthesis with Diffusion Models
- Pluralistic Aging Diffusion Autoencoder
- Improving Sample Quality of Diffusion Models Using Self-Attention Guidance
- Generative Novel View Synthesis with 3D-Aware Diffusion Models
- Long-Term Photometric Consistent Novel View Synthesis with Diffusion Models
- DDFM: Denoising Diffusion Model for Multi-Modality Image Fusion
- Steered Diffusion: A Generalized Framework for Plug-and-Play Conditional Image Synthesis
- Scalable Diffusion Models with Transformers
- HumanSD: A Native Skeleton-Guided Diffusion Model for Human Image Generation
- Controllable Person Image Synthesis with Pose-Constrained Latent Diffusion
- DiffDreamer: Towards Consistent Unsupervised Single-view Scene Extrapolation with Conditional Diffusion Models
- TF-ICON: Diffusion-Based Training-Free Cross-Domain Image Composition
- DISCRETE CONTRASTIVE DIFFUSION FOR CROSSMODAL MUSIC AND IMAGE GENERATION
Text-to-Image generation <a name="3"></a>
- Vector quantized diffusion model for text-to-image synthesis
- Hierarchical text-conditional image generation with CLIP latents
- Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding
- Fast Sampling of Diffusion Models with Exponential Integrator
- DiVAE: Photorealistic Images Synthesis with Denoising Diffusion Decoder
- Text-Guided Synthesis of Artistic Images with Retrieval-Augmented Diffusion Models
- Text2Human: Text-Driven Controllable Human Image Generation
- DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation
- SpaText: Spatio-Textual Representation for Controllable Image Generation
- MultiDiffusion: Fusing Diffusion Paths for Controlled Image Generation
- Person Image Synthesis via Denoising Diffusion Model
- Uncovering the Disentanglement Capability in Text-to-Image Diffusion Models
- Multi-Concept Customization of Text-to-Image Diffusion
- ERNIE-ViLG 2.0: Improving Text-to-Image Diffusion Model with Knowledge-Enhanced Mixture-of-Denoising-Experts
- Shifted Diffusion for Text-to-image Generation
- Specialist Diffusion: Plug-and-Play Sample-Efficient Fine-Tuning of Text-to-Image Diffusion Models To Learn Any Unseen Style
- Safe Latent Diffusion: Mitigating Inappropriate Degeneration in Diffusion Models
- Zero-shot spatial layout conditioning for text-to-image diffusion models
- Text2Tex: Text-driven Texture Synthesis via Diffusion Models
- Ablating Concepts in Text-to-Image Diffusion Models
- Editing Implicit Assumptions in Text-to-Image Diffusion Models
- Harnessing the Spatial-Temporal Attention of Diffusion Models for High-Fidelity Text-to-Image Synthesis
- Localizing Object-Level Shape Variations with Text-to-Image Diffusion Models
- MagicFusion: Boosting Text-to-Image Generation Performance by Fusing Diffusion Models
- BoxDiff: Text-to-Image Synthesis with Training-Free Box-Constrained Diffusion
- Diffusion in Style
- DiffCloth: Diffusion Based Garment Synthesis and Manipulation via Structural Cross-modal Semantic Alignment
- LAW-Diffusion: Complex Scene Generation by Diffusion with Layouts
- Discriminative Class Tokens for Text-to-Image Diffusion Models
- Cones: Concept Neurons in Diffusion Models for Customized Generation
- MultiDiffusion: Fusing Diffusion Paths for Controlled Image Generation
Super-Resolution <a name="4"></a>
- Image super-resolution via iterative refinement
- Score-based Generative Neural Networks for Large-Scale Optimal Transport
- Implicit Diffusion Models for Continuous Super-Resolution
- HSR-Diff: Hyperspectral Image Super-Resolution via Conditional Diffusion Models
Image Editing<a name="5"></a>
- SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations
- Blended Latent Diffusion
- SINE: SINgle Image Editing with Text-to-Image Diffusion Models
- Imagic: Text-Based Real Image Editing with Diffusion Models
- Collaborative Diffusion for Multi-Modal Face Generation and Editing
- Null-text Inversion for Editing Real Images using Guided Diffusion Models
- DiffusionRig: Learning Personalized Priors for Facial Appearance Editing
- RenderDiffusion: Image Diffusion for 3D Reconstruction, Inpainting and Generation
- Paint by Example: Exemplar-based Image Editing with Diffusion Models
- Effective Real Image Editing with Accelerated Iterative Diffusion Inversion
- SVDiff: Compact Parameter Space for Diffusion Fine-Tuning
- Multimodal Garment Designer: Human-Centric Latent Diffusion Models for Fashion Image Editing
- Boundary-Aware Divide and Conquer: A Diffusion-Based Solution for Unsupervised Shadow Removal
- Not All Steps are Created Equal: Selective Diffusion Distillation for Image Manipulation
- Prompt Tuning Inversion for Text-driven Image Editing Using Diffusion Models
- DiFaReli: Diffusion Face Relighting
Region Image Editing <a name="6"></a>
Inpainting <a name="7"></a>
- GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models
- RePaint: Inpainting using Denoising Diffusion Probabilistic Models
- [RGBD2: Generative Scene Synthesis via Incremental View Inpainting using RGBD Diffusion Models] (https://openaccess.thecvf.com/content/CVPR2023/papers/Lei_RGBD2_Generative_Scene_Synthesis_via_Incremental_View_Inpainting_Using_RGBD_CVPR_2023_paper.pdf) 4.SmartBrush: Text and Shape Guided Object Inpainting With Diffusion Model
- DINAR: Diffusion Inpainting of Neural Textures for One-Shot Human Avatars
- Towards Coherent Image Inpainting Using Denoising Diffusion Implicit Models
Image-to-Image Translation <a name="8"></a>
- Palette: Image-to-Image Diffusion Models
- UNIT-DDPM: UNpaired Image Translation with Denoising Diffusion Probabilistic Models
- EGSDE: Unpaired Image-to-Image Translation via Energy-Guided Stochastic Differential Equations
- Pretraining is All You Need for Image-to-Image Translation
- VQBB: Image-to-image Translation with Vector Quantized Brownian Bridge
- The Swiss Army Knife for Image-to-Image Translation: Multi-Task Diffusion Models
- Unifying Diffusion Models' Latent Space, with Applications to CycleDiffusion and Guidance
- BBDM: Image-to-Image Translation with Brownian Bridge Diffusion Models
- Plug-and-Play Diffusion Features for Text-Driven Image-to-Image Translation
- Zero-Shot Contrastive Loss for Text-Guided Diffusion Image Style Transfer
- StyleDiffusion: Controllable Disentangled Style Transfer via Diffusion Models
- Diffusion-based Image Translation with Label Guidance for Domain Adaptive Semantic Segmentation
- Dual Diffusion Implicit Bridges for Image-to-Image Translation
Image Segmentation <a name="9"></a>
- Label-Efficient Semantic Segmentation with Diffusion Models
- SegDiff: Image Segmentation with Diffusion Probabilistic Models
- Multi-Class Segmentation from Aerial Views using Recursive Noise Diffusion
- Ambiguous Medical Image Segmentation using Diffusion Models
- LD-ZNet: A Latent Diffusion Approach for Text-Based Image Segmentation
- Open-vocabulary Object Segmentation with Diffusion Models
Multi-Task <a name="10"></a>
- Generative modeling by estimating gradients of the data distribution
- Score-Based Generative Modeling through Stochastic Differential Equations
- ImageBART: Bidirectional Context with Multinomial Diffusion for Autoregressive Image Synthesis
- Learning Energy-Based Models by Diffusion Recovery Likelihood
- Conditional image generation with score-based diffusion models
- More control for free! Image synthesis with semantic diffusion guidance
- ILVR: Conditioning Method for Denoising Diffusion Probabilistic Models
- Global Context with Discrete Diffusion in Vector Quantised Modelling for Image Generation
- High-Resolution Image Synthesis with Latent Diffusion Models
- Diffusion Autoencoders: Toward a Meaningful and Decodable Representation
- Come-Closer-Diffuse-Faster: Accelerating Conditional Diffusion Models for Inverse Problems through Stochastic Contraction
- DiffusionCLIP: Text-Guided Diffusion Models for Robust Image Manipulation
- Understanding DDPM Latent Codes Through Optimal Transport
- Conditional Simulation Using Diffusion Schrödinger Bridges
- Retrieval-Augmented Diffusion Models
- Accelerating Diffusion Models via Early Stop of the Diffusion Process
- Diffusion Models as Plug-and-Play Priors
- Non-Uniform Diffusion Models
- Diffusion Probabilistic Model Made Slim
- Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models
- On Distillation of Guided Diffusion Model
- DiffCollage: Parallel Generation of Large Content With Diffusion Models
- EGC: Image Generation and Classification via a Diffusion Energy-Based Model
- Diffusion Models as Masked Autoencoders
- Versatile Diffusion: Text, Images and Variations All in One Diffusion Model
- A Latent Space of Stochastic Diffusion Models for Zero-Shot Image Editing and Guidance
- Adding Conditional Control to Text-to-Image Diffusion Models
- FreeDoM: Training-Free Energy-Guided Conditional Diffusion Model
- SinDDM: A Single Image Denoising Diffusion Model
Medical Image-to-Image Translation <a name="11"></a>
- Unsupervised Medical Image Translation with Adversarial Diffusion Models
- Unsupervised Denoising of Retinal OCT with Diffusion Probabilistic Model
- Conversion Between CT and MRI Images Using Diffusion and Score-Matching Models
Medical Image Generation <a name="12"></a>
- Solving inverse problems in medical imaging with score-based generative models
- Score-based diffusion models for accelerated MRI
- Diffusion Models For Medical Image Analysis: A Comprehensive Survey
- Low-Dose CT Using Denoising Diffusion Probabilistic Model for 20× Speedup
- Solving 3D Inverse Problems using Pre-trained 2D Diffusion Models
- DOLCE: A Model-Based Probabilistic Diffusion Framework for Limited-Angle CT Reconstruction
Medical Image Segmentation <a name="13"></a>
- Diffusion Models for Implicit Image Segmentation Ensembles
- Accelerating Diffusion Models via Pre-segmentation Diffusion Sampling for Medical Image Segmentation
- Stochastic Segmentation with Conditional Categorical Diffusion Models
Medical Image Anomaly Detection <a name="14"></a>
- Diffusion Models for Medical Anomaly Detection
- Fast Unsupervised Brain Anomaly Detection and Segmentation with Diffusion Models
- AnoDDPM: Anomaly Detection With Denoising Diffusion Probabilistic Models Using Simplex Noise
- What is Healthy? Generative Counterfactual Diffusion for Lesion Localization
Video Generation <a name="15"></a>
- Video Diffusion Models
- Diffusion Probabilistic Modeling for Video Generation
- Flexible Diffusion Modeling of Long Videos
- Diffusion Models for Video Prediction and Infilling
- Dreamix: Video Diffusion Models are General Video Editors
- Conditional Image-to-Video Generation with Latent Flow Diffusion Models
- DiffTalk: Crafting Diffusion Models for Generalized Audio-Driven Portraits Animation
- MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation
- VideoFusion: Decomposed Diffusion Models for High-Quality Video Generation
- Video Probabilistic Diffusion Models in Projected Latent Space
- Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models
- Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation
- Preserve Your Own Correlation: A Noise Prior for Video Diffusion Models
- Talking Head Generation with Probabilistic Audio-to-Visual Diffusion Priors
- DreamPose: Fashion Video Synthesis with Stable Diffusion
- The Power of Sound (TPoS): Audio Reactive Video Generation with Stable Diffusion
- Structure and Content-Guided Video Synthesis with Diffusion Models
- Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Generators
- SinFusion: Training Diffusion Models on a Single Image or Video
Few-Shot Image Generation <a name="16"></a>
Counterfactual Explanations and Estimations <a name="17"></a>
- Diffusion Models for Counterfactual Explanations
- Diffusion Causal Models for Counterfactual Estimation
Image Restoration <a name="18"></a>
- Restoring Vision in Adverse Weather Conditions with Patch-Based Denoising Diffusion Models
- Denoising Diffusion Restoration Models
- Diffusion in the Dark: A Diffusion Model for Low-Light Text Recognition
- High-resolution image reconstruction with latent diffusion models from human brain activity
- Seeing Beyond the Brain: Conditional Diffusion Model with Sparse Masked Modeling for Vision Decoding
- Diff-Retinex: Rethinking Low-light Image Enhancement with A Generative Diffusion Model
- Towards Authentic Face Restoration with Iterative Diffusion Models and Beyond
- DDS2M: Self-Supervised Denoising Diffusion Spatio-Spectral Model for Hyperspectral Image Restoration
- DiffIR: Efficient Diffusion Model for Image Restoration
- Innovating Real Fisheye Image Correction with Dual Diffusion Architecture
Image Registration <a name="19"></a>
Adversarial Purification <a name="20"></a>
- Diffusion Models for Adversarial Purification
- Robust Evaluation of Diffusion-Based Adversarial Purification
Semantic Image Generation <a name="21"></a>
- Semantic Image Synthesis via Diffusion Models
- DiffuMask: Synthesizing Images with Pixel-level Annotations for Semantic Segmentation Using Diffusion Models
- DDP: Diffusion Model for Dense Visual Prediction
3D Generation <a name="22"></a>
- 3D shape generation and completion through point-voxel diffusion
- RODIN: A Generative Model for Sculpting 3D Digital Avatars Using Diffusion
- Avatars Grow Legs: Generating Smooth Human Motion from Sparse Tracking Inputs with Diffusion Model
- NeuralField-LDM: Scene Generation with Hierarchical Latent Diffusion Models
- Diffusion-SDF: Text-to-Shape via Voxelized Diffusion
- Score Jacobian Chaining: Lifting Pretrained 2D Diffusion Models for 3D Generation
- DATID-3D: Diversity-Preserved Domain Adaptation Using Text-to-Image Diffusion for 3D Generative Model
- HOLODIFFUSION: Training a 3D Diffusion Model using 2D Images
- Dream3D: Zero-Shot Text-to-3D Synthesis Using 3D Shape Prior and Text-to-Image Diffusion Models
- Consistent View Synthesis with Pose-Guided Diffusion Models
- Texture Generation on 3D Meshes with Point-UV Diffusion
- DiffFacto: Controllable Part-Based 3D Point Cloud Generation with Cross Diffusion
- Chupa: Carving 3D Clothed Humans from Skinned Shape Priors using 2D Diffusion Probabilistic Models
- Guided Motion Diffusion for Controllable Human Motion Synthesis
- Unaligned 2D to 3D Translation with Conditional Vector-Quantized Code Diffusion using Transformers
- Make-It-3D: High-fidelity 3D Creation from A Single Image with Diffusion Prior
- TexFusion: Synthesizing 3D Textures with Text-Guided Image Diffusion Models
- Viewset Diffusion: (0-)Image-Conditioned 3D Generative Models from 2D Data
- 3D-aware Image Generation using 2D Diffusion Models
- Diffusion-Guided Reconstruction of Everyday Hand-Object Interaction Clips
- Single-Stage Diffusion NeRF: A Unified Approach to 3D Generation and Reconstruction
- Diffusion-SDF: Conditional Generative Modeling of Signed Distance Functions
- SALAD: Part-Level Latent Diffusion for 3D Shape Generation and Manipulation
- DG3D: Generating High Quality 3D Textured Shapes by Learning to Discriminate Multi-Modal Diffusion-Renderings
- Relightify: Relightable 3D Faces from a Single Image via Diffusion Models
- Distribution-Aligned Diffusion for Human Mesh Recovery
- Diffuse3D: Wide-Angle 3D Photography via Bilateral Diffusion
- PODIA-3D: Domain Adaptation of 3D Generative Model Across Large Domain Gap Using Pose-Preserved Text-to-Image Diffusion
- HyperDiffusion: Generating Implicit Neural Fields with Weight-Space Diffusion
- Improving 3D Imaging with Pre-Trained Perpendicular 2D Diffusion Models
Classification <a name="23"></a>
- Score-based generative classifiers
- Diffusion-based Data Augmentation for Skin Disease Classification: Impact Across Original Medical Datasets to Fully Synthetic Images
- IDiff-Face: Synthetic-based Face Recognition through Fizzy Identity-Conditioned Diffusion Models
- DIRE for Diffusion-Generated Image Detection
- Denoising Diffusion Autoencoders are Unified Self-supervised Learners
- Your Diffusion Model is Secretly a Zero-Shot Classifier
Point Cloud Generation <a name="24"></a>
- Diffusion Probabilistic Models for 3D Point Cloud Generation
- Sketch and Text Guided Diffusion Model for Colored Point Cloud Generation
- GECCO: Geometrically-Conditioned Point Diffusion Models
Theoretical <a name="25"></a>
- A variational perspective on diffusion-based generative models and score matching
- Sampling is as easy as learning the score: theory for diffusion models with minimal data assumptions
- Erasing Concepts from Diffusion Models
- A Complete Recipe for Diffusion Generative Models
- Efficient Diffusion Training via Min-SNR Weighting Strategy
- Phasic Content Fusing Diffusion Model with Directional Distribution Consistency for Few-Shot Model Adaption
- AutoDiffusion: Training-Free Optimization of Time Steps and Architectures for Automated Diffusion Model Acceleration
- End-to-End Diffusion Latent Optimization Improves Classifier Guidance
- Score-Based Diffusion Models as Principled Priors for Inverse Imaging
- Diffusion Model as Representation Learner
- DPM-OT: A New Diffusion Probabilistic Model Based on Optimal Transport
- Unleashing Text-to-Image Diffusion Models for Visual Perception
Graphs <a name="26"></a>
Deblurring <a name="27"></a>
- Image Deblurring with Domain Generalizable Diffusion Models
- Multiscale Structure Guided Diffusion for Image Deblurring
Face Morphing Attack Detection <a name="28"></a>
Trajectory/Motion Prediction <a nav="29"></a>
- Leapfrog Diffusion Model for Stochastic Trajectory Prediction
- Fg-T2M: Fine-Grained Text-Driven Human Motion Generation via Diffusion Model
- PhysDiff: Physics-Guided Human Motion Diffusion Model
- Hierarchical Generation of Human-Object Interactions with Diffusion Probabilistic Models
- Diffusion-Based 3D Human Pose Estimation with Multi-Hypothesis Aggregation
- ReMoDiffuse: Retrieval-Augmented Motion Diffusion Model
- InterDiff: Generating 3D Human-Object Interactions with Physics-Informed Diffusion
- BeLFusion: Latent Diffusion for Behavior-Driven Human Motion Prediction
- Social Diffusion: Long-term Multiple Human Motion Anticipation
Attacks <a nav="30"></a>
Study on data memorization <a nav="31"></a>
1.Diffusion Art or Digital Forgery? Investigating Data Replication in Diffusion Models
Out-of-Distribution Detection <a nav="32"></a>
- DIFFGUARD: Semantic Mismatch-Guided Out-of-Distribution Detection Using Pre-Trained Diffusion Models
- Deep Feature Deblurring Diffusion for Detecting Out-of-Distribution Objects
- Unsupervised Out-of-Distribution Detection with Diffusion Inpainting
Image-to-Text Generation <a nav="33"></a>
Quantization <a nav="34"></a>
Image/Video anomaly detection <a nav="35"></a>
- Feature Prediction Diffusion Model for Video Anomaly Detection
- Unsupervised Surface Anomaly Detection with Diffusion Probabilistic Model
- Multimodal Motion Conditioned Diffusion Model for Skeleton-based Video Anomaly Detection
Video-to-Speech <a nav="36"></a>
Pose estimation <a nav="37"></a>
- DiffPose: Multi-hypothesis Human Pose Estimation using Diffusion Models
- PoseDiffusion: Solving Pose Estimation via Diffusion-aided Bundle Adjustment
- DiffPose: SpatioTemporal Diffusion Model for Video-Based Human Pose Estimation
Graphic layout generation <a nav="38"></a>
- LayoutDiffusion: Improving Graphic Layout Generation by Discrete Diffusion Probabilistic Models
- DLT: Conditioned layout generation with Joint Discrete-Continuous Diffusion Layout Transformer
Image watermarking <a nav="39"></a>
Video Editing <a nav="40"></a>
- Diffusion Video Autoencoders: Toward Temporally Consistent Face Video Editing via Disentangled Video Encoding
- Pix2Video: Video Editing using Image Diffusion
- StableVideo: Text-driven Consistency-aware Diffusion Video Editing