Awesome

ConditionVideo: Training-Free Condition-Guided Text-to-Video Generation (AAAI 2024)

Bo Peng, Xinyuan Chen, Yaohui Wang, Chaochao Lu, Yu Qiao

Project Page | Paper

This is the official PyTorch implementation of paper "ConditionVideo: Training-Free Condition-Guided Text-to-Video Generation"

Our model generates realistic dynamic videos from random noise or given scene videos based on given conditions. Currently, we support openpose keypoint, canny, depth and segment condition.

canny	segment	depth
<img src="videos/0-0-road at night, oil painting style.gif" width="200"><br> A dog, comicbook style	<img src="videos/jellyfish.gif" width="200"><br> A red jellyfish, pastel colours.	<img src="videos/1-0-a horse under a blue sky.gif" width="200"><br> A horse under a blue sky.

pose	customized pose
<img src="videos/62-53-The Astronaut, brown background.gif" width="200"><br> The Astronaut, brown background	<img src="videos/1-2-18-ironman in the sea.gif" width="300"><br> Ironman in the sea

Setup

To install the environments, use:

conda create -n tune-control python=3.10

check cuda version then install the corresponding pytorch package, note that we need pytorch==2.0.0

pip install -r requirements.txt
conda install xformers -c xformers

You may also need to download model checkpoints manually from hugging-face.

Usage

To run the code, use

accelerate launch --num_processes 1 conditionvideo.py --config="configs//config.yaml"

for video generation, change the configuration in config.yaml for different generation settings.

Citation

@misc{peng2023conditionvideo,
      title={ConditionVideo: Training-Free Condition-Guided Text-to-Video Generation}, 
      author={Bo Peng and Xinyuan Chen and Yaohui Wang and Chaochao Lu and Yu Qiao},
      year={2023},
      eprint={2310.07697},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}