Home

Awesome

ConditionVideo: Training-Free Condition-Guided Text-to-Video Generation (AAAI 2024)

Bo Peng, Xinyuan Chen, Yaohui Wang, Chaochao Lu, Yu Qiao

Project Page | Paper

This is the official PyTorch implementation of paper "ConditionVideo: Training-Free Condition-Guided Text-to-Video Generation"

Our model generates realistic dynamic videos from random noise or given scene videos based on given conditions. Currently, we support openpose keypoint, canny, depth and segment condition.

cannysegmentdepth
<img src="videos/0-0-road at night, oil painting style.gif" width="200"><br> A dog, comicbook style<img src="videos/jellyfish.gif" width="200"><br> A red jellyfish, pastel colours.<img src="videos/1-0-a horse under a blue sky.gif" width="200"><br> A horse under a blue sky.
posecustomized pose
<img src="videos/62-53-The Astronaut, brown background.gif" width="200"><br> The Astronaut, brown background<img src="videos/1-2-18-ironman in the sea.gif" width="300"><br> Ironman in the sea

Setup

To install the environments, use:

conda create -n tune-control python=3.10

check cuda version then install the corresponding pytorch package, note that we need pytorch==2.0.0

pip install -r requirements.txt
conda install xformers -c xformers

You may also need to download model checkpoints manually from hugging-face.

Usage

To run the code, use

accelerate launch --num_processes 1 conditionvideo.py --config="configs//config.yaml"

for video generation, change the configuration in config.yaml for different generation settings.

Citation

@misc{peng2023conditionvideo,
      title={ConditionVideo: Training-Free Condition-Guided Text-to-Video Generation}, 
      author={Bo Peng and Xinyuan Chen and Yaohui Wang and Chaochao Lu and Yu Qiao},
      year={2023},
      eprint={2310.07697},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}