Awesome
Awesome Video Diffusion <!-- omit in toc -->
A curated list of recent diffusion models for video generation, editing, restoration, understanding, nerf, etc.
<p align="center"> <img src="https://makeavideo.studio/assets/overview.webp" width="240px"/> <img src="https://makeavideo.studio/assets/A_teddy_bear_painting_a_portrait.webp" width="240px"/> </p> <p align="center"> <img src="https://tuneavideo.github.io/assets/teaser.gif" width="480px"/> </p> <p align="center"> <img src="https://github.com/ChenyangQiQi/FateZero/blob/main/docs/gif_results/17_car_posche_01_concat_result.gif?raw=true" width="240px"/> <img src="https://github.com/ChenyangQiQi/FateZero/blob/main/docs/gif_results/3_sunflower_vangogh_conat_result.gif?raw=true" width="240px"/> </p> <p align="center"> (Source: <a href="https://makeavideo.studio/">Make-A-Video</a>, <a href="https://tuneavideo.github.io/">Tune-A-Video</a>, and <a href="https://fate-zero-edit.github.io/">Fate/Zero</a>.) </p>Table of Contents <!-- omit in toc -->
- Open-source Toolboxes and Foundation Models
- Evaluation Benchmarks and Metrics
- Commercial Product
- Video Generation
- Efficiency for Video Generation
- Controllable Video Generation
- Motion Customization
- Long Video / Film Generation
- Video Generation with Physical Prior / 3D
- Video Editing
- Long-form Video Generation and Completion
- Human or Subject Motion
- AI Safety for Video Generation
- Video Enhancement and Restoration
- Audio Synthesis for Video
- Talking Head Generation
- Human Feedback for Video Generation
- Policy Learning with Video Generation
- Try On with Video Generation
- 3D / NeRF
- 4D
- Open-World Model
- Video Understanding
- Healthcare and Biology
Open-source Toolboxes and Foundation Models
-
Pyramidal Flow Matching for Efficient Video Generative Modeling
-
VideoCrafter: A Toolkit for Text-to-Video Generation and Editing
Evaluation Benchmarks and Metrics
-
Frechet Video Motion Distance: A Metric for Evaluating Motion Consistency in Videos (Jun., 2024)
-
T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generation (Jun., 2024)
-
ChronoMagic-Bench: A Benchmark for Metamorphic Evaluation of Text-to-Time-lapse Video Generation (Jun., 2024)
-
PEEKABOO: Interactive Video Generation via Masked-Diffusion (CVPR, 2024)
-
T2VScore: Towards A Better Metric for Text-to-Video Generation (Jan., 2024)
-
StoryBench: A Multifaceted Benchmark for Continuous Story Visualization (NeurIPS, 2023)
-
VBench: Comprehensive Benchmark Suite for Video Generative Models (Nov., 2023)
-
FETV: A Benchmark for Fine-Grained Evaluation of Open-Domain Text-to-Video Generation (Nov., 2023)
-
EvalCrafter: Benchmarking and Evaluating Large Video Generation Models (Oct., 2023)
-
Evaluation of Text-to-Video Generation Models: A Dynamics Perspective (Jul., 2024)
-
VidProM: A Million-scale Real Prompt-Gallery Dataset for Text-to-Video Diffusion Models (May., 2024)
-
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers (CVPR, 2024)
-
ReLight My NeRF: A Dataset for Novel View Synthesis and Relighting of Real World Objects (CVPR, 2023)
Commercial Product
Video Generation
-
Enhancing Motion in Text-to-Video Generation with Decomposed Encoding and Conditioning (Oct., 2024 | NeurIPS 2024)
-
Improved Video VAE for Latent Video Diffusion Model (Oct., 2024)
-
Progressive Autoregressive Video Diffusion Models (Oct., 2024)
-
Real-Time Video Generation with Pyramid Attention Broadcast (Aug., 2024)
-
xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations (Aug., 2024)
-
CogVideoX: Text-to-video generation (Aug., 2024)
-
FreeLong: Training-Free Long Video Generation with SpectralBlend Temporal Attention (Aug., 2024)
-
VEnhancer: Generative Space-Time Enhancement for Video Generation (Jul., 2024)
-
Live2Diff: Live Stream Translation via Uni-directional Attention in Video Diffusion Models (Jul., 2024)
-
Video Diffusion Alignment via Reward Gradient (Jul., 2024)
-
ExVideo: Extending Video Diffusion Models via Parameter-Efficient Post-Tuning (Jun., 2024)
-
MimicMotion: High-Quality Human Motion Video Generation with Confidence-aware Pose Guidance (Jul., 2024)
-
Identifying and Solving Conditional Image Leakage in Image-to-Video Diffusion Model (Jun., 2024)
-
Video-Infinity: Distributed Long Video Generation (Jun., 2024)
-
MotionBooth: Motion-Aware Customized Text-to-Video Generation (Jun., 2024)
-
Text-Animator: Controllable Visual Text Video Generation (Jun., 2024)
-
UniAnimate: Taming Unified Video Diffusion Models for Consistent Human Image Animation (Jun., 2024)
-
T2V-Turbo: Breaking the Quality Bottleneck of Video Consistency Model with Mixed Reward Feedback (May, 2024)
-
T2V-Turbo-v2: Enhancing Video Generation Model Post-Training Through Data, Reward, and Conditional Guidance Design (Oct, 2024)
-
Collaborative Video Diffusion: Consistent Multi-video Generation with Camera Control (May, 2024)
-
Human4DiT: Free-view Human Video Generation with 4D Diffusion Transformer (May, 2024)
-
FIFO-Diffusion: Generating Infinite Videos from Text without Training (May, 2024)
-
Vidu: a Highly Consistent, Dynamic and Skilled Text-to-Video Generator with Diffusion Models (May, 2024)
-
Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers (May, 2024)
-
StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation (May, 2024)
-
TI2V-Zero: Zero-Shot Image Conditioning for Text-to-Video Diffusion Models (CVPR 2024)
-
ID-Animator: Zero-Shot Identity-Preserving Human Video Generation (Apr., 2024)
-
AnimateZoo: Zero-shot Video Generation of Cross-Species Animation via Subject Alignment (Apr., 2024)
-
MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators (Apr., 2024)
-
TRIP: Temporal Residual Learning with Image Noise Prior for Image-to-Video Diffusion Models (CVPR 2024)
-
VSTAR: Generative Temporal Nursing for Longer Dynamic Video Synthesis (Mar., 2024)
-
StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text (Mar., 2024)
-
Intention-driven Ego-to-Exo Video Generation (Mar., 2024)
-
VideoElevator: Elevating Video Generation Quality with Versatile Text-to-Image Diffusion Models (Mar., 2024)
-
Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis (Feb., 2024)
-
One-Shot Motion Customization of Text-to-Video Diffusion Models (Feb., 2024)
-
Magic-Me: Identity-Specific Video Customized Diffusion (Feb., 2024)
-
ConsistI2V: Enhancing Visual Consistency for Image-to-Video Generation (Feb., 2024)
-
Direct-a-Video: Customized Video Generation with User-Directed Camera Movement and Object Motion (Feb., 2024)
-
Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization (Feb., 2024)
-
Boximator: Generating Rich and Controllable Motions for Video Synthesis (Feb., 2024)
-
Lumiere: A Space-Time Diffusion Model for Video Generation (Jan., 2024)
-
ActAnywhere: Subject-Aware Video Background Generation (Jan., 2024)
-
WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens (Jan., 2024)
-
CustomVideo: Customizing Text-to-Video Generation with Multiple Subjects (Jan., 2024)
-
UniVG: Towards UNIfied-modal Video Generation (Jan., 2024)
-
VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models (Jan., 2024)
-
360DVD: Controllable Panorama Video Generation with 360-Degree Video Diffusion Model (Jan., 2024)
-
RAVEN: Rethinking Adversarial Video Generation with Efficient Tri-plane Networks (Jan., 2024)
-
Latte: Latent Diffusion Transformer for Video Generation (Jan., 2024)
-
MagicVideo-V2: Multi-Stage High-Aesthetic Video Generation (Jan., 2024)
-
VideoDrafter: Content-Consistent Multi-Scene Video Generation with LLM (Jan., 2024)
-
FlashVideo: A Framework for Swift Inference in Text-to-Video Generation (Dec., 2023)
-
I2V-Adapter: A General Image-to-Video Adapter for Video Diffusion Models (Dec., 2023)
-
A Recipe for Scaling up Text-to-Video Generation with Text-free Videos (Dec., 2023)
-
PIA: Your Personalized Image Animator via Plug-and-Play Modules in Text-to-Image Models (Dec., 2023)
-
VideoPoet: A Large Language Model for Zero-Shot Video Generation (Dec., 2023)
-
InstructVideo: Instructing Video Diffusion Models with Human Feedback (Dec., 2023)
-
VideoLCM: Video Latent Consistency Model (Dec., 2023)
-
PEEKABOO: Interactive Video Generation via Masked-Diffusion (Dec., 2023)
-
FreeInit: Bridging Initialization Gap in Video Diffusion Models (Dec., 2023)
-
Photorealistic Video Generation with Diffusion Models (Dec., 2023)
-
Upscale-A-Video: Temporal-Consistent Diffusion Model for Real-World Video Super-Resolution (Dec., 2023)
-
DreaMoving: A Human Video Generation Framework based on Diffusion Models (Dec., 2023)
-
MotionCrafter: One-Shot Motion Customization of Diffusion Models (Dec., 2023)
-
AnimateZero: Video Diffusion Models are Zero-Shot Image Animators (Dec., 2023)
-
AVID: Any-Length Video Inpainting with Diffusion Model (Dec., 2023)
-
MTVG : Multi-text Video Generation with Text-to-Video Models (Dec., 2023)
-
DreamVideo: Composing Your Dream Videos with Customized Subject and Motion (Dec., 2023)
-
Hierarchical Spatio-temporal Decoupling for Text-to-Video Generation (Dec., 2023)
-
GenTron: Delving Deep into Diffusion Transformers for Image and Video Generation (CVPR 2024)
-
GenDeF: Learning Generative Deformation Field for Video Generation (Dec., 2023)
-
F3-Pruning: A Training-Free and Generalized Pruning Strategy towards Faster and Finer Text-to-Video Synthesis (Dec., 2023)
-
DreamVideo: High-Fidelity Image-to-Video Generation with Image Retention and Text Guidance (Dec., 2023)
-
LivePhoto: Real Image Animation with Text-guided Motion Control (Dec., 2023)
-
Fine-grained Controllable Video Generation via Object Appearance and Context (Dec., 2023)
-
VideoBooth: Diffusion-based Video Generation with Image Prompts (Dec., 2023)
-
StyleCrafter: Enhancing Stylized Text-to-Video Generation with Style Adapter (Dec., 2023)
-
MicroCinema: A Divide-and-Conquer Approach for Text-to-Video Generation (Nov., 2023)
-
ART•V: Auto-Regressive Text-to-Video Generation with Diffusion Models (Nov., 2023)
-
Smooth Video Synthesis with Noise Constraints on Diffusion Models for One-shot Video Tuning (Nov., 2023)
-
VideoAssembler: Identity-Consistent Video Generation with Reference Entities using Diffusion Model (Nov., 2023)
-
MotionZero:Exploiting Motion Priors for Zero-shot Text-to-Video Generation (Nov., 2023)
-
MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model (Nov., 2023)
-
FlowZero: Zero-Shot Text-to-Video Synthesis with LLM-Driven Dynamic Scene Syntax (Nov., 2023)
-
Sketch Video Synthesis (Nov., 2023)
-
Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets (Nov., 2023)
-
Decouple Content and Motion for Conditional Image-to-Video Generation (Nov., 2023)
-
FusionFrames: Efficient Architectural Aspects for Text-to-Video Generation Pipeline (Nov., 2023)
-
Fine-Grained Open Domain Image Animation with Motion Guidance (Nov., 2023)
-
GPT4Motion: Scripting Physical Motions in Text-to-Video Generation via Blender-Oriented GPT Planning (Nov., 2023)
-
MagicDance: Realistic Human Dance Video Generation with Motions & Facial Expressions Transfer (Nov., 2023)
-
MoVideo: Motion-Aware Video Generation with Diffusion Models (Nov., 2023)
-
Make Pixels Dance: High-Dynamic Video Generation (Nov., 2023)
-
Emu Video: Factorizing Text-to-Video Generation by Explicit Image Conditioning (Nov., 2023)
-
Optimal Noise pursuit for Augmenting Text-to-Video Generation (Nov., 2023)
-
VideoDreamer: Customized Multi-Subject Text-to-Video Generation with Disen-Mix Finetuning (Nov., 2023)
-
SEINE: Short-to-Long Video Diffusion Model for Generative Transition and Prediction (Oct., 2023)
-
FreeNoise: Tuning-Free Longer Video Diffusion Via Noise Rescheduling (Oct., 2023)
-
DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors (Oct., 2023)
-
LAMP: Learn A Motion Pattern for Few-Shot-Based Video Generation (Oct., 2023)
-
Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation (Sep., 2023)
-
MotionDirector: Motion Customization of Text-to-Video Diffusion Models (Sep., 2023)
-
LAVIE: High-Quality Video Generation with Cascaded Latent Diffusion Models (Sep., 2023)
-
Free-Bloom: Zero-Shot Text-to-Video Generator with LLM Director and LDM Animator (Sep., 2023)
-
Hierarchical Masked 3D Diffusion Model for Video Outpainting (Sep., 2023)
-
Reuse and Diffuse: Iterative Denoising for Text-to-Video Generation (Sep., 2023)
-
VideoGen: A Reference-Guided Latent Diffusion Approach for High Definition Text-to-Video Generation (Sep., 2023)
-
MagicAvatar: Multimodal Avatar Generation and Animation (Aug., 2023)
-
Empowering Dynamics-aware Text-to-Video Diffusion with Large Language Models (Aug., 2023)
-
SimDA: Simple Diffusion Adapter for Efficient Video Generation (Aug., 2023)
-
ModelScope Text-to-Video Technical Report (Aug., 2023)
-
Dual-Stream Diffusion Net for Text-to-Video Generation (Aug., 2023)
-
InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation (Jul., 2023)
-
Animate-A-Story: Storytelling with Retrieval-Augmented Video Generation (Jul., 2023)
-
AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning (Jul., 2023)
-
DisCo: Disentangled Control for Referring Human Dance Generation in Real World (Jul., 2023)
-
Learn the Force We Can: Enabling Sparse Motion Control in Multi-Object Video Generation (Jun., 2023)
-
VideoComposer: Compositional Video Synthesis with Motion Controllability (Jun., 2023)
-
Probabilistic Adaptation of Text-to-Video Models (Jun., 2023)
-
Make-Your-Video: Customized Video Generation Using Textual and Structural Guidance (Jun., 2023)
-
Gen-L-Video: Multi-Text to Long Video Generation via Temporal Co-Denoising (May, 2023)
-
Cinematic Mindscapes: High-quality Video Reconstruction from Brain Activity (May, 2023)
-
Any-to-Any Generation via Composable Diffusion (May, 2023)
-
VideoFactory: Swap Attention in Spatiotemporal Diffusions for Text-to-Video Generation (May, 2023)
-
Preserve Your Own Correlation: A Noise Prior for Video Diffusion Models (May, 2023)
-
LaMD: Latent Motion Diffusion for Video Generation (Apr., 2023)
-
Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models (CVPR 2023)
-
Text2Performer: Text-Driven Human Video Generation (Apr., 2023)
-
Generative Disco: Text-to-Video Generation for Music Visualization (Apr., 2023)
-
Latent-Shift: Latent Diffusion with Temporal Shift (Apr., 2023)
-
DreamPose: Fashion Image-to-Video Synthesis via Stable Diffusion (Apr., 2023)
-
Follow Your Pose: Pose-Guided Text-to-Video Generation using Pose-Free Videos (Apr., 2023)
-
Physics-Driven Diffusion Models for Impact Sound Synthesis from Videos (CVPR 2023)
-
Seer: Language Instructed Video Prediction with Latent Diffusion Models (Mar., 2023)
-
Text2video-Zero: Text-to-Image Diffusion Models Are Zero-Shot Video Generators (Mar., 2023)
-
Conditional Image-to-Video Generation with Latent Flow Diffusion Models (CVPR 2023)
-
Decomposed Diffusion Models for High-Quality Video Generation (CVPR 2023)
-
Video Probabilistic Diffusion Models in Projected Latent Space (CVPR 2023)
-
Learning 3D Photography Videos via Self-supervised Diffusion on Single Images (Feb., 2023)
-
Structure and Content-Guided Video Synthesis With Diffusion Models (Feb., 2023)
-
Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation (ICCV 2023)
-
Mm-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation (CVPR 2023)
-
Magvit: Masked Generative Video Transformer (Dec., 2022)
-
VIDM: Video Implicit Diffusion Models (AAAI 2023)
-
Efficient Video Prediction via Sparsely Conditioned Flow Matching (Nov., 2022)
-
Latent Video Diffusion Models for High-Fidelity Video Generation With Arbitrary Lengths (Nov., 2022)
-
SinFusion: Training Diffusion Models on a Single Image or Video (Nov., 2022)
-
MagicVideo: Efficient Video Generation With Latent Diffusion Models (Nov., 2022)
-
Imagen Video: High Definition Video Generation With Diffusion Models (Oct., 2022)
-
Make-A-Video: Text-to-Video Generation without Text-Video Data (ICLR 2023)
-
Diffusion Models for Video Prediction and Infilling (TMLR 2022)
-
McVd: Masked Conditional Video Diffusion for Prediction, Generation, and Interpolation (NeurIPS 2022)
-
Video Diffusion Models (Apr., 2022)
-
Diffusion Probabilistic Modeling for Video Generation (Mar., 2022)
Efficiency for Video Generation
-
Adaptive Caching for Faster Video Generation with Diffusion Transformers (Nov., 2024)
-
Fast and Memory-Efficient Video Diffusion Using Streamlined Inference (Nov., 2024)
Controllable Video Generation
-
MVideo: Motion Control for Enhanced Complex Action Video Generation (Nov., 2024)
-
ReCapture: Generative Video Camera Controls for User-Provided Videos using Masked Video Fine-Tuning (Nov., 2024)
-
SG-I2V: Self-Guided Trajectory Control in Image-to-Video Generation (Nov., 2024)
-
X-Portrait: Expressive Portrait Animation with Hierarchical Motion Attention (Nov., 2024)
-
LumiSculpt: A Consistency Lighting Control Network for Video Generation (Nov., 2024)
-
FRAMER: INTERACTIVE FRAME INTERPOLATION (Oct., 2024)
-
CamI2V: Camera-Controlled Image-to-Video Diffusion Model (Oct., 2024)
-
Cavia: Camera-controllable Multi-view Video Diffusion with View-Integrated Attention (Oct., 2024)
-
Animate Your Motion: Turning Still Images into Dynamic Videos(Mar., 2023|ECCV 2024)
-
EasyControl: Transfer ControlNet to Video Diffusion for Controllable Generation and Interpolation (Aug., 2024)
-
ControlNeXt: Powerful and Efficient Control for Image and Video Generation (Aug., 2024)
-
TrackGo: A Flexible and Efficient Method for Controllable Video Generation (Aug., 2024)
-
Puppet-Master: Scaling Interactive Video Generation as a Motion Prior for Part-Level Dynamics (Aug., 2024)
-
Sketch2Scene: Automatic Generation of Interactive 3D Game Scenes from User's Casual Sketches (Aug., 2024)
-
Expressive Whole-Body 3D Gaussian Avatar (Aug., 2024)
-
Tora: Trajectory-oriented Diffusion Transformer for Video Generation (Jul., 2024)
-
HumanVid: Demystifying Training Data for Camera-controllable Human Image Animation (Jul., 2024)
-
Cinemo: Consistent and Controllable Image Animation with Motion Diffusion Models (Jul., 2024)
-
VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control (Jul., 2024)
-
Still-Moving: Customized Video Generation without Customized Video Data (Jul., 2024)
-
LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control (Jul., 2024)
-
Identifying and Solving Conditional Image Leakage in Image-to-Video Diffusion Model (Jun., 2024 | NeurIPS 2024)
-
Image Conductor: Precision Control for Interactive Video Synthesis (Jun., 2024)
-
MimicMotion: High-Quality Human Motion Video Generation with Confidence-aware Pose Guidance (Jun., 2024)
-
FreeTraj: Tuning-Free Trajectory Control in Video Diffusion Models (Jun., 2024)
-
MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model (Jun., 2024)
-
Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance (Mar., 2024)
-
TrailBlazer: Trajectory Control for Diffusion-Based Video Generation (Jan., 2024)
-
Motion-Zero: Zero-Shot Moving Object Control Framework for Diffusion-Based Video Generation (Jan., 2024)
-
Moonshot: Towards Controllable Video Generation and Editing with Multimodal Conditions (Jan., 2024)
-
MotionCtrl: A Unified and Flexible Motion Controller for Video Generation (Dec., 2023)
-
Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation (Nov., 2023)
-
SparseCtrl: Adding Sparse Controls to Text-to-Video Diffusion Models (Nov., 2023)
-
Control-A-Video: Controllable Text-to-Video Generation with Diffusion Models (May, 2023)
-
Motion-Conditioned Diffusion Model for Controllable Video Synthesis (Apr., 2023)
-
ControlVideo: Training-free Controllable Text-to-Video Generation (May, 2023)
-
DragNUWA: Fine-grained Control in Video Generation by Integrating Text, Image, and Trajectory (Aug., 2023)
-
DragAnything: Motion Control for Anything using Entity Representation (ECCV, 2024)
-
CameraCtrl: Enabling Camera Control for Video Diffusion Models (Apr., 2024)
-
Training-free Camera Control for Video Generation (Jun., 2024)
-
Customizing Motion in Text-to-Video Diffusion Models (Dec., 2023)
-
MotionClone: Training-Free Motion Cloning for Controllable Video Generation (Jun., 2024)
Motion Customization
-
I2VControl-Camera: Precise Video Camera Control with Adjustable Motion Strength (Nov., 2024)
-
MotionDirector: Motion Customization of Text-to-Video Diffusion Models (Sep., 2023 | ECCV 2024)
-
LAMP: Learn A Motion Pattern for Few-Shot-Based Video Generation (Oct., 2023 | CVPR 2024)
-
Vmc: Video motion customization using temporal attention adaption for text-to-video diffusion models (Dec., 2023 | CVPR 2024)
-
DreamVideo: Composing Your Dream Videos with Customized Subject and Motion (Dec., 2023 | CVPR 2024)
-
MotionCtrl: A Unified and Flexible Motion Controller for Video Generation (Dec., 2023 | SIGGRAPH 2024)
-
Customizing Motion in Text-to-Video Diffusion Models (Dec., 2023)
-
Direct-a-Video: Customized Video Generation with User-Directed Camera Movement and Object Motion (Feb., 2024)
-
Customize-A-Video: One-Shot Motion Customization of Text-to-Video Diffusion Models (Feb., 2024)
-
DreamMotion: Space-Time Self-Similar Score Distillation for Zero-Shot Video Editing (Mar., 2024 | ECCV 2024)
-
DragAnything: Motion Control for Anything using Entity Representation (Mar., 2024 | ECCV 2024)
-
Spectral Motion Alignment for Video Motion Transfer using Diffusion Models (Mar., 2024)
-
Motion Inversion for Video Customization (Mar., 2024)
-
Edit-Your-Motion: Space-Time Diffusion Decoupling Learning for Video Motion Editing (Mar., 2024)
-
Video Diffusion Models are Training-free Motion Interpreter and Controller (May., 2024)
-
Collaborative Video Diffusion: Consistent Multi-video Generation with Camera Control (May., 2024)
-
MotionClone: Training-Free Motion Cloning for Controllable Video Generation (Jun., 2024)
-
FreeTraj: Tuning-Free Trajectory Control in Video Diffusion Models (Jun., 2024)
-
Zero-Shot Controllable Image-to-Video Animation via Motion Decomposition (Jul., 2024 | ACM MM 2024)
-
Tora: Trajectory-oriented Diffusion Transformer for Video Generation (Jul., 2024)
-
Reenact Anything: Semantic Video Motion Transfer Using Motion-Textual Inversion (Aug., 2024)
Long Video / Film Generation
-
StoryMaker: Towards consistent characters in text-to-image generation (Nov., 2024)
-
Storynizor: Consistent Story Generation via Inter-Frame Synchronized and Shuffled ID Injection (Nov., 2024)
-
ACDC: Autoregressive Coherent Multimodal Generation using Diffusion Correction (Nov., 2024)
-
Story-Adapter: A Training-free Iterative Framework for Long Story Visualization (Nov., 2024)
-
In-Context LoRA for Diffusion Transformers (Aug., 2024)
-
SEED-Story: Multimodal Long Story Generation with Large Language Model (Jul., 2024)
-
StoryAgent: Customized Storytelling Video Generation via Multi-Agent Collaboration (Nov., 2024)
-
ARLON: Boosting Diffusion Transformers With Autoregressive Models for Long Video Generation (Oct., 2024)
-
Unbounded: A Generative Infinite Game of Character Life Simulation (Oct., 2024)
-
Loong: Generating Minute-level Long Videos with Autoregressive Language Models (Oct., 2024)
-
DreamCinema: Cinematic Transfer with Free Camera and 3D Character (Oct., 2024)
-
CinePreGen: Camera Controllable Video Previsualization via Engine-powered Diffusion (Aug., 2024)
-
DreamCinema: Cinematic Transfer with Free Camera and 3D Character (Aug., 2024)
-
SkyScript-100M: 1,000,000,000 Pairs of Scripts and Shooting Scripts for Short Drama (Aug., 2024)
-
Anim-Director: A Large Multimodal Model Powered Agent for Controllable Animation Video Generation (Aug., 2024)
-
Kubrick: Multimodal Agent Collaborations for Synthetic Video Generation (Aug., 2024)
-
DreamFactory: Pioneering Multi-Scene Long Video Generation with a Multi-Agent Framework (Jul, 2024)
-
MovieDreamer: Hierarchical Generation for Coherent Long Visual Sequence (Jul, 2024)
-
AutoAD-Zero: A Training-Free Framework for Zero-Shot Audio Description (Jul, 2024)
-
AesopAgent: Agent-driven Evolutionary System on Story-to-Video Production (Jul, 2024)
-
TheaterGen: Character Management with LLM for Consistent Multi-turn Image Generation (Jul, 2024)
-
AutoStudio: Crafting Consistent Subjects in Multi-turn Interactive Image Generation (Jul, 2024)
-
DreamStory: Open-Domain Story Visualization by LLM-Guided Multi-Subject Consistent Diffusion (Jul, 2024)
-
VideoDirectorGPT: Consistent Multi-scene Video Generation via LLM-Guided Planning (Jul, 2024)
Video Generation with Physical Prior / 3D
-
AutoVFX: Physically Realistic Video Editing from Natural Language Instructions (Nov, 2024)
-
How Far is Video Generation from World Model: A Physical Law Perspective (Oct, 2024)
-
Tex4D: Zero-shot 4D Scene Texturing with Video Diffusion Models (Oct, 2024)
-
PhyGenBench: Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation (Oct, 2024)
-
PhysGen: Rigid-Body Physics-Grounded Image-to-Video Generation (Oct, 2024)
-
StereoCrafter: Diffusion-based Generation of Long and High-fidelity Stereoscopic 3D from Monocular Videos (Oct, 2024)
-
ViewCrafter: Taming Video Diffusion Models for High-fidelity Novel View Synthesis (Sep, 2024)
-
Compositional 3D-aware Video Generation with LLM Director (Aug, 2024)
-
IDOL: Unified Dual-Modal Latent Diffusion for Human-Centric Joint Video-Depth Generation (Jul, 2024)
-
PhysDreamer: Physics-Based Interaction with 3D Objects via Video Generation (ECCV 2024)
Video Editing
-
Unified Editing of Panorama, 3D Scenes, and Videos Through Disentangled Self-Attention Injection (May, 2024)
-
I2VEdit: First-Frame-Guided Video Editing via Image-to-Video Diffusion Models (May, 2024)
-
Looking Backward: Streaming Video-to-Video Translation with Feature Banks (May, 2024)
-
ReVideo: Remake a Video with Motion and Content Control (May, 2024)
-
Slicedit: Zero-Shot Video Editing With Text-to-Image Diffusion Models Using Spatio-Temporal Slices (May, 2024)
-
ViViD: Video Virtual Try-on using Diffusion Models (May, 2024)
-
Edit-Your-Motion: Space-Time Diffusion Decoupling Learning for Video Motion Editing (May, 2024)
-
GenVideo: One-shot target-image and shape aware video editing using T2I diffusion models (Apr., 2024)
-
EVA: Zero-shot Accurate Attributes and Multi-Object Video Editing (Mar., 2024)
-
Spectral Motion Alignment for Video Motion Transfer using Diffusion Models (Mar., 2024)
-
AnyV2V: A Tuning-Free Framework For Any Video-to-Video Editing Tasks (Mar., 2024)
-
CoCoCo: Improving Text-Guided Video Inpainting for Better Consistency, Controllability and Compatibility (Mar., 2024)
-
DreamMotion: Space-Time Self-Similarity Score Distillation for Zero-Shot Video Editing (Mar., 2024)
-
Video Editing via Factorized Diffusion Distillation (Mar., 2024)
-
FastVideoEdit: Leveraging Consistency Models for Efficient Text-to-Video Editing (Mar., 2024)
-
UniEdit: A Unified Tuning-Free Framework for Video Motion and Appearance Editing (Feb., 2024)
-
Object-Centric Diffusion for Efficient Video Editing (Jan., 2024)
-
VASE: Object-Centric Shape and Appearance Manipulation of Real Videos (Jan., 2024)
-
FlowVid: Taming Imperfect Optical Flows for Consistent Video-to-Video Synthesis (Dec., 2023)
-
Fairy: Fast Parallelized Instruction-Guided Video-to-Video Synthesis (Dec., 2023)
-
RealCraft: Attention Control as A Solution for Zero-shot Long Video Editing (Dec., 2023)
-
MaskINT: Video Editing via Interpolative Non-autoregressive Masked Transformers (Dec., 2023)
-
VidToMe: Video Token Merging for Zero-Shot Video Editing (Dec., 2023)
-
A Video is Worth 256 Bases: Spatial-Temporal Expectation-Maximization Inversion for Zero-Shot Video Editing (Dec., 2023)
-
Neutral Editing Framework for Diffusion-based Video Editing (Dec., 2023)
-
DiffusionAtlas: High-Fidelity Consistent Diffusion Video Editing (Dec., 2023)
-
RAVE: Randomized Noise Shuffling for Fast and Consistent Video Editing with Diffusion Models (Dec., 2023)
-
SAVE: Protagonist Diversification with Structure Agnostic Video Editing (Dec., 2023)
-
MagicStick: Controllable Video Editing via Control Handle Transformations (Dec., 2023)
-
VideoSwap: Customized Video Subject Swapping with Interactive Semantic Point Correspondence (CVPR 2024)
-
DragVideo: Interactive Drag-style Video Editing (Dec., 2023)
-
Drag-A-Video: Non-rigid Video Editing with Point-based Interaction (Dec., 2023)
-
BIVDiff: A Training-Free Framework for General-Purpose Video Synthesis via Bridging Image and Video Diffusion Models (Dec., 2023)
-
VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models (CVPR 2024)
-
FLATTEN: optical FLow-guided ATTENtion for consistent text-to-video editing (ICLR 2024)
-
MotionEditor: Editing Video Motion via Content-Aware Diffusion (Nov., 2023)
-
Motion-Conditioned Image Animation for Video Editing (Nov., 2023)
-
Space-Time Diffusion Features for Zero-Shot Text-Driven Motion Transfer (CVPR 2024)
-
Cut-and-Paste: Subject-Driven Video Editing with Attention Control (Nov., 2023)
-
LatentWarp: Consistent Diffusion Latents for Zero-Shot Video-to-Video Translation (Nov., 2023)
-
Fuse Your Latents: Video Editing with Multi-source Latent Diffusion Models (Oct., 2023)
-
DynVideo-E: Harnessing Dynamic NeRF for Large-Scale Motion- and View-Change Human-Centric Video Editing (Oct., 2023)
-
Ground-A-Video: Zero-shot Grounded Video Editing using Text-to-image Diffusion Models (ICLR 2024)
-
CCEdit: Creative and Controllable Video Editing via Diffusion Models (Sep., 2023)
-
MagicProp: Diffusion-based Video Editing via Motion-aware Appearance Propagation (Sep., 2023)
-
MagicEdit: High-Fidelity and Temporally Coherent Video Editing (Aug., 2023)
-
StableVideo: Text-driven Consistency-aware Diffusion Video Editing (ICCV 2023)
-
CoDeF: Content Deformation Fields for Temporally Consistent Video Processing (CVPR 2024)
-
TokenFlow: Consistent Diffusion Features for Consistent Video Editing (ICLR 2024)
-
INVE: Interactive Neural Video Editing (Jul., 2023)
-
VidEdit: Zero-Shot and Spatially Aware Text-Driven Video Editing (Jun., 2023)
-
Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation (SIGGRAPH Asia 2023)
-
ControlVideo: Adding Conditional Control for One Shot Text-to-Video Editing (May, 2023)
-
Make-A-Protagonist: Generic Video Editing with An Ensemble of Experts (May, 2023)
-
Soundini: Sound-Guided Diffusion for Natural Video Editing (Apr., 2023)
-
Zero-Shot Video Editing Using Off-the-Shelf Image Diffusion Models (Mar., 2023)
-
Edit-A-Video: Single Video Editing with Object-Aware Consistency (Mar., 2023)
-
FateZero: Fusing Attentions for Zero-shot Text-based Video Editing (Mar., 2023)
-
Pix2video: Video Editing Using Image Diffusion (Mar., 2023)
-
Video-P2P: Video Editing with Cross-attention Control (Mar., 2023)
-
Dreamix: Video Diffusion Models Are General Video Editors (Feb., 2023)
-
Shape-Aware Text-Driven Layered Video Editing (Jan., 2023)
-
Speech Driven Video Editing via an Audio-Conditioned Diffusion Model (Jan., 2023)
-
Diffusion Video Autoencoders: Toward Temporally Consistent Face Video Editing via Disentangled Video Encoding (CVPR 2023)
Long-form Video Generation and Completion
-
Redefining Temporal Modeling in Video Diffusion: The Vectorized Timestep Approach (Oct., 2024)
-
Streetscapes: Large-scale Consistent Street View Generation Using Autoregressive Video Diffusion (Jul., 2024)
-
MCVD: Masked Conditional Video Diffusion for Prediction, Generation, and Interpolation (NeurIPS 2022)
-
NUWA-XL: Diffusion over Diffusion for eXtremely Long Video Generation (Mar., 2023)
-
Flexible Diffusion Modeling of Long Videos (May, 2022)
Human or Subject Motion
-
KMM: Key Frame Mask Mamba for Extended Motion Generation (Nov., 2024)
-
DanceFusion: A Spatio-Temporal Skeleton Diffusion Transformer for Audio-Driven Dance Motion Reconstruction (Nov., 2024)
-
Enhancing Motion in Text-to-Video Generation with Decomposed Encoding and Conditioning (Nov., 2024)
-
A Comprehensive Survey on Human Video Generation: Challenges, Methods, and Insights (Jul., 2024)
-
OccFusion: Rendering Occluded Humans with Generative Diffusion Priors (Jul., 2024)
-
EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditions (Jul., 2024)
-
DiffSHEG: A Diffusion-Based Approach for Real-Time Speech-driven Holistic 3D Expression and Gesture Generation (CVPR 2024)
-
Avatars Grow Legs: Generating Smooth Human Motion from Sparse Tracking Inputs with Diffusion Model (CVPR 2023)
-
InterGen: Diffusion-based Multi-human Motion Generation under Complex Interactions (Apr., 2023)
-
ReMoDiffuse: Retrieval-Augmented Motion Diffusion Model (Apr., 2023)
-
Human Motion Diffusion as a Generative Prior (Mar., 2023)
-
Can We Use Diffusion Probabilistic Models for 3d Motion Prediction? (Feb., 2023)
-
Single Motion Diffusion (Feb., 2023)
-
HumanMAC: Masked Motion Completion for Human Motion Prediction (Feb., 2023)
-
DiffMotion: Speech-Driven Gesture Synthesis Using Denoising Diffusion Model (Jan., 2023)
-
Modiff: Action-Conditioned 3d Motion Generation With Denoising Diffusion Probabilistic Models (Jan., 2023)
-
Unifying Human Motion Synthesis and Style Transfer With Denoising Diffusion Probabilistic Models (GRAPP 2023)
-
Executing Your Commands via Motion Diffusion in Latent Space (CVPR 2023)
-
Pretrained Diffusion Models for Unified Human Motion Synthesis (Dec., 2022)
-
PhysDiff: Physics-Guided Human Motion Diffusion Model (Dec., 2022)
-
BeLFusion: Latent Diffusion for Behavior-Driven Human Motion Prediction (Dec., 2022)
-
Diffusion Motion: Generate Text-Guided 3d Human Motion by Diffusion Model (ICASSP 2023)
-
Human Joint Kinematics Diffusion-Refinement for Stochastic Motion Prediction (Oct., 2022)
-
Human Motion Diffusion Model (ICLR 2023)
-
FLAME: Free-form Language-based Motion Synthesis & Editing (AAAI 2023)
-
MotionDiffuse: Text-Driven Human Motion Generation with Diffusion Model (Aug., 2022)
-
Stochastic Trajectory Prediction via Motion Indeterminacy Diffusion (CVPR 2022)
AI Safety for Video Generation
Video Enhancement and Restoration
-
Disentangled Motion Modeling for Video Frame Interpolation (Jun, 2024)
-
DiffIR2VR-Zero: Zero-Shot Video Restoration with Diffusion-based Image Restoration Models (Jul., 2024)
-
LDMVFI: Video Frame Interpolation with Latent Diffusion Models (Mar., 2023)
-
CaDM: Codec-aware Diffusion Modeling for Neural-enhanced Video Streaming (Nov., 2022)
Audio Synthesis for Video
-
MuVi: Video-to-Music Generation with Semantic Alignment and Rhythmic Synchronization (Oct., 2024)
-
Diverse and Aligned Audio-to-Video Generation via Text-to-Video Model Adaptation (Sep., 2023)
-
VMAs: Video-to-Music Generation via Semantic Alignment in Web Music Videos (Oct., 2024)
-
STA-V2A: Video-to-Audio Generation with Semantic and Temporal Alignment (Oct., 2024)
-
Draw an Audio: Leveraging Multi-Instruction for Video-to-Audio Synthesis (Sep., 2024)
-
Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming (Jul., 2024)
-
Speech To Speech: an effort for an open-sourced and modular GPT4-o (Aug., 2024)
-
Masked Generative Video-to-Audio Transformers with Enhanced Synchronicity (Jul., 2024)
-
Video-to-Audio Generation with Hidden Alignment (Jul., 2024)
-
Read, Watch and Scream! Sound Generation from Text and Video (Jul., 2024)
-
FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds (July, 2024)
-
Network Bending of Diffusion Models for Audio-Visual Generation (CVPR, 2024)
Talking Head Generation
-
Loopy: Taming Audio-Driven Portrait Avatar with Long-Term Motion Dependency (Nov., 2024)
-
HelloMeme: Integrating Spatial Knitting Attentions to Embed High-Level and Fidelity-Rich Conditions in Diffusion Models (Oct., 2024)
-
PersonaTalk: Bring Attention to Your Persona in Visual Dubbing (Oct., 2024)
-
Talking With Hands 16.2M: A Large-Scale Dataset of Synchronized Body-Finger Motion and Audio for Conversational Motion Analysis and Synthesis (Oct., 2024)
-
Learning to Listen: Modeling Non-Deterministic Dyadic Facial Motion (Oct., 2024)
-
GestureDiffuCLIP: Gesture Diffusion Model with CLIP Latents (Oct., 2024)
-
From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations (Oct., 2024)
-
Takin-ADA: Emotion Controllable Audio-Driven Animation with Canonical and Landmark Loss Optimization (Oct., 2024)
-
DAWN: Dynamic Frame Avatar with Non-autoregressive Diffusion Framework for Talking Head Video Generation (Oct., 2024)
-
MimicTalk: Mimicking a personalized and expressive 3D talking face in few minutes (Oct., 2024)
-
Hallo2: Long-Duration and High-Resolution Audio-driven Portrait Image Animation (Oct., 2024)
-
Listen, Denoise, Action! Audio-Driven Motion Synthesis With Diffusion Models (Nov. 2022)
-
TANGO: Co-Speech Gesture Video Reenactment with Hierarchical Audio Motion Embedding and Diffusion Interpolation (Oct., 2024)
-
Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation (Jun., 2024)
Human Feedback for Video Generation
- VIDEOSCORE: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation (July, 2024)
Policy Learning with Video Generation
-
EgoVid-5M: A Large-Scale Video-Action Dataset for Egocentric Video Generation (Nov, 2024)
-
GR-MG: Leveraging Partially Annotated Data via Multi-Modal Goal Conditioned Policy (July, 2024)
-
Any-point Trajectory Modeling for Policy Learning (July, 2024)
-
This&That: Language-Gesture Controlled Video Generation for Robot Planning (Jun, 2024)
-
Dreamitate: Real-World Visuomotor Policy Learning via Video Generation (Jun, 2024)
Try On with Video Generation
3D / NeRF
-
GPT-4V(ision) is a Human-Aligned Evaluator for Text-to-3D Generation (Jan., 2024)
-
MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion (Oct., 2024)
-
L3DG: Latent 3D Gaussian Diffusion (Oct., 2024)
-
ReconX: Reconstruct Any Scene from Sparse Views with Video Diffusion Model (Oct., 2024)
-
Hi3D: Pursuing High-Resolution Image-to-3D Generation with Video Diffusion Models (Oct., 2024)
-
ReconX: Reconstruct Any Scene from Sparse Views with Video Diffusion Model (Aug., 2024)
-
SV4D: Dynamic 3D Content Generation with Multi-Frame and Multi-View Consistency (Jul., 2024)
-
Shape of Motion: 4D Reconstruction from a Single Video (Jul., 2024)
-
WonderWorld: Interactive 3D Scene Generation from a Single Image (Jun., 2024)
-
WonderJourney: Going from Anywhere to Everywhere (CVPR 2024)
-
MultiDiff: Consistent Novel View Synthesis from a Single Image (CVPR, 2024)
-
Vivid-ZOO: Multi-View Video Generation with Diffusion Model (Jun, 2024)
-
Director3D: Real-world Camera Trajectory and 3D Scene Generation from Text (June, 2024)
-
YouDream: Generating Anatomically Controllable Consistent Text-to-3D Animals (June, 2024)
-
Text2NeRF: Text-Driven 3D Scene Generation with Neural Radiance Fields (May, 2023)
-
RoomDreamer: Text-Driven 3D Indoor Scene Synthesis with Coherent Geometry and Texture (May, 2023)
-
NeuralField-LDM: Scene Generation with Hierarchical Latent Diffusion Models (CVPR 2023)
-
Single-Stage Diffusion NeRF: A Unified Approach to 3D Generation and Reconstruction (Apr., 2023)
-
Instruct-NeRF2NeRF: Editing 3D Scenes with Instructions (Mar., 2023)
-
DiffusioNeRF: Regularizing Neural Radiance Fields with Denoising Diffusion Models (Feb., 2023)
-
NerfDiff: Single-image View Synthesis with NeRF-guided Distillation from 3D-aware Diffusion (Feb., 2023)
-
DiffRF: Rendering-guided 3D Radiance Field Diffusion (CVPR 2023)
4D
- DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion (Nov., 2024)
Open-World Model
-
AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents (Nov., 2024)
-
Oasis: A Universe in a Transformer (Nov., 2024)
-
Digital Life Project: Autonomous 3D Characters with Social Intelligence (CVPR 2024)
-
3D-VLA: A 3D Vision-Language-Action Generative World Model (ICML 2024)
Video Understanding
-
VLM-Grounder: A VLM Agent for Zero-Shot 3D Visual Grounding (Oct., 2024)
-
Exploring Diffusion Models for Unsupervised Video Anomaly Detection (Apr., 2023)
-
PDPP:Projected Diffusion for Procedure Planning in Instructional Videos (CVPR 2023)
-
DiffTAD: Temporal Action Detection with Proposal Denoising Diffusion (Mar., 2023)
-
Diffusion Action Segmentation (ICCV 2023)
-
DiffusionRet: Generative Text-Video Retrieval with Diffusion Model (ICCV 2023)
-
Refined Semantic Enhancement Towards Frequency Diffusion for Video Captioning (Nov., 2022)
-
A Generalist Framework for Panoptic Segmentation of Images and Videos (Oct., 2022)
Healthcare and Biology
-
Artificial Intelligence for Biomedical Video Generation (Nov., 2024)
-
Exploring Variational Autoencoders for Medical Image Generation: A Comprehensive Study (Nov., 2024)
-
MedSora: Optical Flow Representation Alignment Mamba Diffusion Model for Medical Video Generation (Nov., 2024)
-
Annealed Score-Based Diffusion Model for Mr Motion Artifact Reduction (Jan., 2023)
-
Feature-Conditioned Cascaded Video Diffusion Models for Precise Echocardiogram Synthesis (Mar., 2023)
-
Neural Cell Video Synthesis via Optical-Flow Diffusion (Dec., 2022)