Home

Awesome

<h1 align="center">Diffusion Model-Based Video Editing: A Survey</h1> <p align="center"> <a href="https://github.com/wenhao728/awesome-diffusion-v2v"><img src="https://cdn.rawgit.com/sindresorhus/awesome/d7305f38d29fed78fa85652e3a63e154dd8e8829/media/badge.svg" ></a> <a href="https://arxiv.org/abs/2407.07111"><img src="https://img.shields.io/badge/arXiv-Paper-B31B1B.svg"></a> <a href="./doc/README.md"><img src="https://img.shields.io/badge/benchmark-Dataset-blue?style=flat"></a> <a href="./doc/leaderboard.md"><img src="https://img.shields.io/badge/benchmark-Leaderboard-1abc9c?style=flat"></a> <!-- <a href="https://opensource.org/license/mit/"><img src="https://img.shields.io/badge/license-MIT-blue"></a> --> <img alt="GitHub last commit" src="https://img.shields.io/github/last-commit/wenhao728/awesome-diffusion-v2v?style=social"></a> <!-- <img alt="GitHub watchers" src="https://img.shields.io/github/watchers/wenhao728/awesome-diffusion-v2v?style=social"> --> <!-- <img alt="GitHub stars" src="https://img.shields.io/github/stars/wenhao728/awesome-diffusion-v2v?style=social"></a> --> </p> <p align="center"> <a href="https://github.com/wenhao728">Wenhao Sun</a>, <a href=https://github.com/rongchengtu1>Rong-Cheng Tu</a>, <a>Jingyi Liao</a>, <a>Dacheng Tao</a> <br> <em>Nanyang Technological University</em> </p> <!-- <p align="center"> <img src="asset/teaser.gif" width="1024px"/> </p> -->

https://github.com/wenhao728/awesome-diffusion-v2v/assets/65353366/fd42e40f-265d-4d72-8dc1-bf74d00fe87b

🍻 Citation

If you find this repository helpful, please consider citing our paper:

@article{sun2024v2vsurvey,
    author = {Wenhao Sun and Rong-Cheng Tu and Jingyi Liao and Dacheng Tao},
    title = {Diffusion Model-Based Video Editing: A Survey},
    journal = {CoRR},
    volume = {abs/2407.07111},
    year = {2024}
}

📌 Introduction

<p align="center"> <img src="asset/taxonomy-repo.png" width="85%"> <br><em>Overview of diffusion-based video editing model components.</em> </p> <!-- The diffusion process defines a Markov chain that progressively adds random noise to data and learns to reverse this process to generate desired data samples from noise. Deep neural networks facilitate the transitions between latent states. -->

[!TIP] The papers are listed in reverse chronological order, formatted as follows: (Conference/Journal Year) Title, Authors

Network and Training Paradigm

Temporal Adaption

<p align="right">(<a href="#top">back to top</a>)</p>

Structure Conditioning

<p align="right">(<a href="#top">back to top</a>)</p>

Training Modification

<p align="right">(<a href="#top">back to top</a>)</p>

Attention Feature Injection

Inversion-Based Feature Injection

<p align="right">(<a href="#top">back to top</a>)</p>

Motion-Based Feature Injection

<p align="right">(<a href="#top">back to top</a>)</p>

Diffusion Latents Manipulation

Latent Initialization

<p align="right">(<a href="#top">back to top</a>)</p>

Latent Transition

<p align="right">(<a href="#top">back to top</a>)</p>

Canonical Representation

<p align="right">(<a href="#top">back to top</a>)</p>

Novel Conditioning

Point-Based Editing

<p align="right">(<a href="#top">back to top</a>)</p>

Pose-Guided Human Action Editing

<p align="right">(<a href="#top">back to top</a>)</p>

📜 Change Log