Awesome
UniCtrl
This repository is the implementation of
[TMLR 2024] UniCtrl: Improving the Spatiotemporal Consistency of Text-to-Video Diffusion Models via Training-Free Unified Attention Control
- Authors: Tian Xia<sup>*</sup>, Xuweiyi Chen<sup>*</sup>, Sihan Xu<sup>**</sup>
- Affiliation: University of Michigan, University of Virginia, PixAI.art,
- <sup>*</sup>Equal contribution, <sup>**</sup>Correspondence
Project page | Paper | Demo
<table> <tr> <td><img src="./assets/girl/orig_sample.gif" alt="Original" style="width:100%"></td> <td><img src="./assets/girl/ctrl_sample.gif" alt="UniCtrl" style="width:100%"></td> </tr> <tr> <td align="center">Original</td> <td align="center">UniCtrl</td> </tr> </table> <table> <tr> <td><img src="./assets/mt/orig_sample.gif" alt="Original" style="width:100%"></td> <td><img src="./assets/mt/ctrl_sample.gif" alt="UniCtrl" style="width:100%"></td> </tr> <tr> <td align="center">Original</td> <td align="center">UniCtrl</td> </tr> </table>Updates🔥
- Our code about UniCtrl is released and you can checkout our paper as well!
Overview 📖
We introduce UniCtrl, a novel, plug-and-play method that is universally applicable to improve the spatiotemporal consistency and motion diversity of videos generated by text-to-video models without additional training. UniCtrl ensures semantic consistency across different frames through cross-frame self-attention control, and meanwhile, enhances the motion quality and spatiotemporal consistency through motion injection and spatiotemporal synchronization.
Quick Start🔨
1. Clone Repo
git clone https://github.com/XuweiyiChen/UniCtrl.git
cd UniCtrl
cd examples/AnimateDiff
2. Prepare Environment
conda env create -f environment.yaml
conda activate animatediff_pt2
3. Download Checkpoints
Please refer to the official repo of AnimateDiff for the full setup guide. The setup guide is listed here.
Quickstart guide
git lfs install
git clone https://huggingface.co/runwayml/stable-diffusion-v1-5 models/StableDiffusion/
bash download_bashscripts/0-MotionModule.sh
bash download_bashscripts/5-RealisticVision.sh
🤗 Gradio Demo
We provide a Gradio Demo to demonstrate our method with UI.
python app.py
Alternatively, you can try the online demo hosted on Hugging Face: [demo link].
Citation :fountain_pen:
If you find our repo useful for your research, please consider citing our paper:
@misc{chen2024unictrl,
title={UniCtrl: Improving the Spatiotemporal Consistency of Text-to-Video Diffusion Models via Training-Free Unified Attention Control},
author={Xuweiyi Chen and Tian Xia and Sihan Xu},
year={2024},
eprint={2403.02332},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
Acknowledgement :white_heart:
This project is distributed under the MIT License. See LICENSE
for more information.
The example code is built upon AnimateDiff and FreeInit. Thanks to the team for their impressive work!