Home

Awesome

UniCtrl

Paper Project Page Hugging Face

This repository is the implementation of

UniCtrl: Improving the Spatiotemporal Consistency of Text-to-Video Diffusion Models via Training-Free Unified Attention Control

Project page | Paper | Demo

<table> <tr> <td><img src="./assets/girl/orig_sample.gif" alt="Original" style="width:100%"></td> <td><img src="./assets/girl/ctrl_sample.gif" alt="UniCtrl" style="width:100%"></td> </tr> <tr> <td align="center">Original</td> <td align="center">UniCtrl</td> </tr> </table> <table> <tr> <td><img src="./assets/mt/orig_sample.gif" alt="Original" style="width:100%"></td> <td><img src="./assets/mt/ctrl_sample.gif" alt="UniCtrl" style="width:100%"></td> </tr> <tr> <td align="center">Original</td> <td align="center">UniCtrl</td> </tr> </table>

Updates🔥

Overview 📖

overall_structure

We introduce UniCtrl, a novel, plug-and-play method that is universally applicable to improve the spatiotemporal consistency and motion diversity of videos generated by text-to-video models without additional training. UniCtrl ensures semantic consistency across different frames through cross-frame self-attention control, and meanwhile, enhances the motion quality and spatiotemporal consistency through motion injection and spatiotemporal synchronization.

Quick Start🔨

1. Clone Repo

git clone https://github.com/XuweiyiChen/UniCtrl.git
cd UniCtrl
cd examples/AnimateDiff

2. Prepare Environment

conda env create -f environment.yaml
conda activate animatediff_pt2

3. Download Checkpoints

Please refer to the official repo of AnimateDiff for the full setup guide. The setup guide is listed here.

Quickstart guide

git lfs install
git clone https://huggingface.co/runwayml/stable-diffusion-v1-5 models/StableDiffusion/

bash download_bashscripts/0-MotionModule.sh
bash download_bashscripts/5-RealisticVision.sh

🤗 Gradio Demo

We provide a Gradio Demo to demonstrate our method with UI.

python app.py

Alternatively, you can try the online demo hosted on Hugging Face: [demo link].

Citation :fountain_pen:

If you find our repo useful for your research, please consider citing our paper:

 @misc{chen2024unictrl,
     title={UniCtrl: Improving the Spatiotemporal Consistency of Text-to-Video Diffusion Models via Training-Free Unified Attention Control}, 
     author={Xuweiyi Chen and Tian Xia and Sihan Xu},
     year={2024},
     eprint={2403.02332},
     archivePrefix={arXiv},
     primaryClass={cs.CV}
 }

Acknowledgement :white_heart:

This project is distributed under the MIT License. See LICENSE for more information.

The example code is built upon AnimateDiff and FreeInit. Thanks to the team for their impressive work!