Awesome
FreeNoise: Tuning-Free Longer Video Diffusion via Noise Rescheduling
๐ฅ๐ฅ๐ฅ The LongerCrafter for longer high-quality video generation are now released!
<div align="center"> <p style="font-weight: bold"> โ totally <span style="color: red; font-weight: bold">no</span> tuning โ less than <span style="color: red; font-weight: bold">20%</span> extra time โ support <span style="color: red; font-weight: bold">512</span> frames </p><a href='https://arxiv.org/abs/2310.15169'><img src='https://img.shields.io/badge/arXiv-2310.15169-b31b1b.svg'></a> ย ย ย ย ย <a href='http://haonanqiu.com/projects/FreeNoise.html'><img src='https://img.shields.io/badge/Project-Page-Green'></a> ย ย ย ย ย ย ย ย ย ย
Haonan Qiu, Menghan Xia*, Yong Zhang, Yingqing He, <br> Xintao Wang, Ying Shan, and Ziwei Liu* <br><br> (* corresponding author)
From Tencent AI Lab and Nanyang Technological University.
<img src=assets/t2v/hd01.gif>
<p>Input: "A chihuahua in astronaut suit floating in space, cinematic lighting, glow effect"; <br> Resolution: 1024 x 576; Frames: 64.</p> <img src=assets/t2v/hd02.gif> <p>Input: "Campfire at night in a snowy forest with starry sky in the background"; <br> Resolution: 1024 x 576; Frames: 64.</p> </div>๐ Introduction
๐ค๐ค๐ค LongerCrafter (FreeNoise) is a tuning-free and time-efficient paradigm for longer video generation based on pretrained video diffusion models.
1. Longer Single-Prompt Text-to-video Generation
<div align="center"> <img src=assets/t2v/sp512.gif> <p>Longer single-prompt results. Resolution: 256 x 256; Frames: 512. (Compressed)</p> </div>2. Longer Multi-Prompt Text-to-video Generation
<div align="center"> <img src=assets/t2v/mp256.gif> <p>Longer multi-prompt results. Resolution: 256 x 256; Frames: 256. (Compressed)</p> </div>๐ Changelog
- [2024.01.28]: ๐ฅ๐ฅ Support FreeNoise on VideoCrafter2!
- [2024.01.23]: ๐ฅ๐ฅ Support FreeNoise on other two video frameworks AnimateDiff and LaVie!
- [2023.10.25]: ๐ฅ๐ฅ Release the 256x256 model and support multi-prompt generation!
- [2023.10.24]: ๐ฅ๐ฅ Release the LongerCrafter (FreeNoise), longer video generation! <br>
๐งฐ Models
Model | Resolution | Checkpoint | Description |
---|---|---|---|
VideoCrafter (Text2Video) | 576x1024 | Hugging Face | Support 64 frames on NVIDIA A100 (40GB) |
VideoCrafter (Text2Video) | 256x256 | Hugging Face | Support 512 frames on NVIDIA A100 (40GB) |
VideoCrafter2 (Text2Video) | 320x512 | Hugging Face | Support 128 frames on NVIDIA A100 (40GB) |
(Reduce the number of frames when you have smaller GPUs, e.g. 256x256 resolutions with 64 frames.)
โ๏ธ Setup
Install Environment via Anaconda (Recommended)
conda create -n freenoise python=3.8.5
conda activate freenoise
pip install -r requirements.txt
๐ซ Inference
1. Longer Text-to-Video
<!-- 1) Download pretrained T2V models via [Hugging Face](https://huggingface.co/VideoCrafter/Text2Video-512-v1/blob/main/model.ckpt), and put the `model.ckpt` in `checkpoints/base_512_v1/model.ckpt`. 2) Input the following commands in terminal. ```bash sh scripts/run_text2video_freenoise_512.sh ``` -->- Download pretrained T2V models via Hugging Face, and put the
model.ckpt
incheckpoints/base_1024_v1/model.ckpt
. - Input the following commands in terminal.
sh scripts/run_text2video_freenoise_1024.sh
2. Longer Multi-Prompt Text-to-Video
- Download pretrained T2V models via Hugging Face, and put the
model.ckpt
incheckpoints/base_256_v1/model.ckpt
. - Input the following commands in terminal.
sh scripts/run_text2video_freenoise_mp_256.sh
๐งฒ Support For Other Models
FreeNoise is supposed to work on other similar frameworks. An easy way to test compatibility is by shuffling the noise to see whether a new similar video can be generated (set eta to 0). If your have any questions about applying FreeNoise to other frameworks, feel free to contact Haonan Qiu.
Current official implementation: FreeNoise-VideoCrafter, FreeNoise-AnimateDiff, FreeNoise-LaVie
๐จโ๐ฉโ๐งโ๐ฆ Crafter Family
VideoCrafter: Framework for high-quality video generation.
ScaleCrafter: Tuning-free method for high-resolution image/video generation.
TaleCrafter: An interactive story visualization tool that supports multiple characters.
๐ Citation
@misc{qiu2023freenoise,
title={FreeNoise: Tuning-Free Longer Video Diffusion Via Noise Rescheduling},
author={Haonan Qiu and Menghan Xia and Yong Zhang and Yingqing He and Xintao Wang and Ying Shan and Ziwei Liu},
year={2023},
eprint={2310.15169},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
๐ข Disclaimer
We develop this repository for RESEARCH purposes, so it can only be used for personal/research/non-commercial purposes.