Home

Awesome

<div align="center"> <h1> Upscale-A-Video:<br> Temporal-Consistent Diffusion Model for Real-World Video Super-Resolution </h1> <div> <a href='https://shangchenzhou.com/' target='_blank'>Shangchen Zhou<sup>āˆ—</sup></a>&emsp; <a href='https://pq-yang.github.io/' target='_blank'>Peiqing Yang<sup>āˆ—</sup></a>&emsp; <a href='https://iceclear.github.io/' target='_blank'>Jianyi Wang</a>&emsp; <a href='https://github.com/Luo-Yihang' target='_blank'>Yihang Luo</a>&emsp; <a href='https://www.mmlab-ntu.com/person/ccloy/' target='_blank'>Chen Change Loy</a> </div> <div> S-Lab, Nanyang Technological University </div> <div> <strong>CVPR 2024 (Highlight)</strong> </div> <div> <h4 align="center"> <a href="https://shangchenzhou.com/projects/upscale-a-video/" target='_blank'> <img src="https://img.shields.io/badge/šŸ³-Project%20Page-blue"> </a> <a href="https://arxiv.org/abs/2312.06640" target='_blank'> <img src="https://img.shields.io/badge/arXiv-2312.06640-b31b1b.svg"> </a> <a href="https://www.youtube.com/watch?v=b9J3lqiKnLM" target='_blank'> <img src="https://img.shields.io/badge/Demo%20Video-%23FF0000.svg?logo=YouTube&logoColor=white"> </a> <img src="https://api.infinitescript.com/badgen/count?name=sczhou/Upscale-A-Video"> </h4> </div>

<strong>Upscale-A-Video is a diffusion-based model that upscales videos by taking the low-resolution video and text prompts as inputs.</strong>

<div style="width: 100%; text-align: center; margin:auto;"> <img style="width:100%" src="assets/teaser.png"> </div>

:open_book: For more visual results, go checkout our <a href="##" target="_blank">project page</a>


</div>

šŸ”„ Update

šŸŽ¬ Overview

overall_structure

šŸ”§ Dependencies and Installation

  1. Clone Repo

    git clone https://github.com/sczhou/Upscale-A-Video.git
    cd Upscale-A-Video
    
  2. Create Conda Environment and Install Dependencies

    # create new conda env
    conda create -n UAV python=3.9 -y
    conda activate UAV
    
    # install python dependencies
    pip install -r requirements.txt
    
  3. Download Models

    (a) Download pretrained models and configs from Google Drive and put them under the pretrained_models/upscale_a_video folder.

    The pretrained_models directory structure should be arranged as:

    ā”œā”€ā”€ pretrained_models
    ā”‚   ā”œā”€ā”€ upscale_a_video
    ā”‚   ā”‚   ā”œā”€ā”€ low_res_scheduler
    ā”‚   ā”‚       ā”œā”€ā”€ ...
    ā”‚   ā”‚   ā”œā”€ā”€ propagator
    ā”‚   ā”‚       ā”œā”€ā”€ ...
    ā”‚   ā”‚   ā”œā”€ā”€ scheduler
    ā”‚   ā”‚       ā”œā”€ā”€ ...
    ā”‚   ā”‚   ā”œā”€ā”€ text_encoder
    ā”‚   ā”‚       ā”œā”€ā”€ ...
    ā”‚   ā”‚   ā”œā”€ā”€ tokenizer
    ā”‚   ā”‚       ā”œā”€ā”€ ...
    ā”‚   ā”‚   ā”œā”€ā”€ unet
    ā”‚   ā”‚       ā”œā”€ā”€ ...
    ā”‚   ā”‚   ā”œā”€ā”€ vae
    ā”‚   ā”‚       ā”œā”€ā”€ ...
    

    (a) (Optional) LLaVA can be downloaded automatically when set --use_llava to True, for users with access to huggingface.

ā˜•ļø Quick Inference

The --input_path can be either the path to a single video or a folder containing multiple videos.

We provide several examples in the inputs folder. Run the following commands to try it out:

## AIGC videos
python inference_upscale_a_video.py \
-i ./inputs/aigc_1.mp4 -o ./results -n 150 -g 6 -s 30 -p 24,26,28

python inference_upscale_a_video.py \
-i ./inputs/aigc_2.mp4 -o ./results -n 150 -g 6 -s 30 -p 24,26,28

python inference_upscale_a_video.py \
-i ./inputs/aigc_3.mp4 -o ./results -n 150 -g 6 -s 30 -p 20,22,24
## old videos/movies/animations 
python inference_upscale_a_video.py \
-i ./inputs/old_video_1.mp4 -o ./results -n 150 -g 9 -s 30

python inference_upscale_a_video.py \
-i ./inputs/old_movie_1.mp4 -o ./results -n 100 -g 5 -s 20 -p 17,18,19

python inference_upscale_a_video.py \
-i ./inputs/old_movie_2.mp4 -o ./results -n 120 -g 6 -s 30 -p 8,10,12

python inference_upscale_a_video.py \
-i ./inputs/old_animation_1.mp4 -o ./results -n 120 -g 6 -s 20 --use_video_vae

If you notice any color discrepancies between the output and the input, you can set --color_fix to "AdaIn" or "Wavelet". By default, it is set to "None".

šŸŽžļø YouHQ Dataset

The datasets are hosted on Google Drive

DatasetLinkDescription
YouHQ-TrainGoogle Drive38,576 videos for training, each of which has around 32 frames.
YouHQ40-TestGoogle Drive40 video clips for evaluation, each of which has around 32 frames.

šŸ“‘ Citation

If you find our repo useful for your research, please consider citing our paper:

@inproceedings{zhou2024upscaleavideo,
   title={{Upscale-A-Video}: Temporal-Consistent Diffusion Model for Real-World Video Super-Resolution},
   author={Zhou, Shangchen and Yang, Peiqing and Wang, Jianyi and Luo, Yihang and Loy, Chen Change},
   booktitle={CVPR},
   year={2024}
}

šŸ“ License

This project is licensed under <a rel="license" href="./LICENSE">NTU S-Lab License 1.0</a>. Redistribution and use should follow this license.

šŸ“§ Contact

If you have any questions, please feel free to reach us at shangchenzhou@gmail.com or peiqingyang99@outlook.com.