Home

Awesome

<div align="center">

Video-Infinity

<img src='./assets/VideoGen-Main.png' width='80%' /> <br> <a href="https://arxiv.org/abs/2406.16260"><img src="https://img.shields.io/badge/ariXv-2406.16260-A42C25.svg" alt="arXiv"></a> <a href="https://video-infinity.tanzhenxiong.com"><img src="https://img.shields.io/badge/ProjectPage-Video Infinity-376ED2#376ED2.svg" alt="arXiv"></a> </div>

Video-Infinity: Distributed Long Video Generation <br> Zhenxiong Tan, Xingyi Yang, Songhua Liu, and Xinchao Wang <br> Learning and Vision Lab, National University of Singapore <br>

TL;DR (Too Long; Didn't Read)

Video-Infinity generates long videos quickly using multiple GPUs without extra training. Feel free to visit our project page for more information and generated videos.

Features

Setup

Installation Environment

conda create -n video_infinity_vc2 python=3.10
conda activate video_infinity_vc2
pip install -r requirements.txt
<!-- ### Download Pretrained Models We provide a diffusers pipeline for [VideoCrafter2](TODO) to generate long videos. ```bash huggingface-cli download adamdad/videocrafterv2_diffusers ``` -->

Usage

Quick Start

python inference.py --config examples/config.json
python inference.py --config examples/multi_prompts.json
python inference.py --config examples/single_gpu.json

Config

Basic Config

ParameterDescription
devicesThe list of GPU devices to use.
base_pathThe path to save the generated videos.

Pipeline Config

ParameterDescription
promptsThe list of text prompts. Note: The number of prompts should be greater than the number of GPUs.
file_nameThe name of the generated video.
num_framesThe number of frames to generate on each GPU.

Video-Infinity Config

ParameterDescription
*.paddingThe number of local context frames.
attn.topkThe number of global context frames for Attention model.
attn.local_phaseWhen the denoise timestep is less than t, it bias the attention. This adds a local_bias to the local context frames and a global_bias to the global context frames.
attn.global_phaseIt is similar to local_phase. But it bias the attention when the denoise timestep is greater than t.
attn.token_num_scaleIf the value is True, the scale factor will be rescaled by the number of tokens. Default is False. More details can be referred to this paper.

How to Set Config

Citation

@article{
  tan2024videoinf,
  title={Video-Infinity: Distributed Long Video Generation},
  author={Zhenxiong Tan, Xingyi Yang, Songhua Liu, and Xinchao Wang},
  journal={arXiv preprint arXiv:2406.16260},
  year={2024}
}

Acknowledgements

Our project is based on the VideoCrafter2 model. We would like to thank the authors for their excellent work! ❤️