Home

Awesome

<div align="center"> <h2><font color="red"> Follow-Your-Canvas πŸ–Ό : </font></center> <br> <center>Higher-Resolution Video Outpainting with Extensive Content Generation</h2>

Qihua Chen*, Yue Ma*, Hongfa Wang*, Junkun yuan*βœ‰οΈ,

Wenzhe Zhao, Qi Tian, Hongmei Wang,Shaobo Min, Qifeng Chen, and Wei Liuβœ‰οΈ

<a href='https://arxiv.org/abs/2409.01055'><img src='https://img.shields.io/badge/ArXiv-2409.01055-red'></a> <a href='https://follow-your-canvas.github.io/'><img src='https://img.shields.io/badge/Project-Page-Green'></a> visitors GitHub

</div>

πŸ“£ Updates

πŸ“‹ Introduction

Follow-Your-Canvas enables higher-resolution video outpainting with rich content generation, overcoming GPU memory constraints and maintaining spatial-temporal consistency.

<img src='ifc.png' width=800>

πŸ› οΈ Environment

Before running the code, make sure you have setup the environment and installed the required packages. Since the outpainting window is 51251264 each time, you need a GPU with at least 60G memory for both training and inference.

pip install -r requirements.txt

Download our checkpoints here.

You also need to download [sam_vit_b_01ec64], [stable-diffusion-2-1], and [Qwen-VL-Chat].

Finally, these pretrained models should be organized as follows:

pretrained_models
β”œβ”€β”€ sam
β”‚Β Β  └── sam_vit_b_01ec64.pth
β”œβ”€β”€ follow-your-canvas
β”‚Β Β  └── checkpoint-40000.ckpt
β”œβ”€β”€ stable-diffusion-2-1
└── Qwen-VL-Chat

πŸ† Train

We also provide the training code for Follow-Your-Canvas. In our implementation, eight NVIDIA A800 GPUs are used for training (50K steps). First, you should download the Panda-70M dataset. Our dataset (animatediff/dataset.py) needs a csv which contains the video file names and prompt.

# config the csv path and video path in train_outpainting-SAM.yaml
torchrun --nnodes=1 --nproc_per_node=8 --master_port=8888 train.py --config train_outpainting-SAM.yaml

πŸš€ Inference

We support outpaint with and without prompt (where the prompt will be generated by Qwen).

# outpaint the video in demo_video/panda to 2k with prompt 'a panda sitting on a grassy area in a lake, with forest mountain in the background'.
python3 inference_outpainting-dir.py --config infer-configs/infer-9-16.yaml
# outpaint the video in demo_video/polar to 2k without prompt.
python3 inference_outpainting-dir-with-prompt.py --config infer-configs/prompt-panda.yaml

The result will be saved in /infer.

πŸ† Evaluation

We evaluate our Follow-Your-Canvas on the DAVIS 2017 dataset. Here we provide the input for each experimental settings, gt videos and our outpainting results. The code for PSNR, SSIM, LPIPS, and FVD metics is in /video_metics/demo.py and fvd2.py. To compute aesthetic quality (AQ) and imaging quality (IQ) from V-Bench:

cd video_metics
git clone https://github.com/Vchitect/VBench.git
pip install -r VBench/requirements.txt
pip install VBench
# change the video dir in evaluate-quality.sh
bash evaluate-quality.sh

πŸ‘¨β€πŸ‘©β€πŸ‘§β€πŸ‘¦ Follow Family

Follow-Your-Pose: Pose-Guided text-to-Video Generation.

Follow-Your-Click: Open-domain Regional image animation via Short Prompts.

Follow-Your-Handle: Controllable Video Editing via Control Handle Transformations.

Follow-Your-Emoji: Fine-Controllable and Expressive Freestyle Portrait Animation.

Follow-Your-Canvas: High-resolution video outpainting with rich content generation.

πŸ’— Acknowledgement

We acknowledge the following open source projects.

AnimateDiff   VBench  

βœ… Citation

If you find Follow-Your-Canvas useful for your research, welcome to 🌟 this repo and cite our work using the following BibTeX: