Awesome
Arbitrary-Scale Video Super-Resolution with Structural and Textural Priors (ECCV2024)
[arXiv]
This repository is the official PyTorch implementation of ST-AVSR: Arbitrary-Scale Video Super-Resolution with Structural and Textural Priors.
Introduction
Our method can achieve video SR with an arbitrary scale. If global SR is required, simply input the size of the target, which corresponds to hr_coord
in the code. And the SR scale is related to cell
in the code. If local super-resolution is required, hr_coord
needs to be cropped.
Examples of the Demo
https://github.com/shangwei5/ST-AVSR/assets/43960503/3a8dd3c0-21fd-499c-8ccb-4362c6c5dcb0
https://github.com/shangwei5/ST-AVSR/assets/43960503/42babacd-1b23-480b-9984-c205c62f2b6d
https://github.com/shangwei5/ST-AVSR/assets/43960503/d18fd854-fee3-41f7-9c0a-de416ee49c8b
Prerequisites
- Python >= 3.8, PyTorch == 1.13.0 (Perhaps >=1.8.0 is also OK), CUDA 11.7
- Requirements: opencv-python, numpy, matplotlib, imageio, scikit-image, thop, tqdm, cupy(cupy-cuda117), mmcv-full=1.6.2
Datasets
Please download the RS-GOPRO datasets from REDS (Type: Sharp) and Vid4.
Dataset Organization Form
|--REDS
|--train
|--train_sharp
|--video 1
|--frame 1
|--frame 2
:
|--video 2
:
|--video n
|--val
|--val_sharp
|--video 1
|--frame 1
|--frame 2
:
|--video 2
:
|--video n
|--Vid4
|--video 1
|--frame 1
|--frame 2
:
|--video 2
:
|--video n
Download Pre-trained Model
Please download the pre-trained model from BaiduDisk(password:47q3) or GoogleDrive. Please put the models to ./
.
Our results on REDS and Vid4 can also be downloaded from BaiduDisk(password:rkf7) and BaiduDisk(password:6gv9).
Getting Started
1) Testing
- Processing the entire video frames:
bash test_sequence.sh
Please change --data_path
according to yours.
- Processing frame by frame :
bash test.sh
Please change --data_path
according to yours.
- Processing other datasets with no GT:
python test_seq_yours.py --data_path /your/data/path/ --model_path /your/model/path/ --result_path /your/result/path/ --space_scale "4,4" --max_patch 256
space_scale
currently only supports integers, and there may be some issues with non-integers. max_patch
represents the size of the crop patch, which can be reduced if GPU memory is still insufficient.
2) Training
- Training ST-AVSR from scratch.
We use an NVIDIA RTX A6000 (48GB) for training. Please adjust the batch_size
and test{'n_seq'}
in options based on your GPU memory.
python -m torch.distributed.launch --nproc_per_node=1 --master_port=1234 train.py --opt options/train_refsrrnn_cuf_siren_adists_only_future_t2.json --dist True
Please change gpu_ids
, path{'root', 'images'}
, and data_root
in options according to yours.
- Train B-AVSR first, then fine-tune ST-AVSR based on B-AVSR.
We use an NVIDIA RTX A6000 (48GB) for training. Please adjust the batch_size
and test{'n_seq'}
in options based on your GPU memory.
python -m torch.distributed.launch --nproc_per_node=1 --master_port=1234 train.py --opt options/train_refsrrnn_cuf_siren.json --dist True
Please change gpu_ids
, path{'root', 'images'}
, and data_root
in options according to yours.
python -m torch.distributed.launch --nproc_per_node=1 --master_port=1234 train.py --opt options/train_refsrrnn_cuf_siren_adists_only_future_t2.json --dist True
Please change gpu_ids
, path{'root', 'images', 'pretrained_netG'}
, and data_root
in options according to yours.
Note: If ‘out of memory’ occurs during the validation, please adjust the appropriate sequence length test{'n_seq'}
. The code of validation is implemented by processing the entire video sequence.
Cite
If you use any part of our code, or ST-AVSR is useful for your research, please consider citing:
@article{shang2024arbitrary,
title={Arbitrary-Scale Video Super-Resolution with Structural and Textural Priors},
author={Shang, Wei and Ren, Dongwei and Zhang, Wanying and Fang, Yuming and Zuo, Wangmeng and Ma, Kede},
journal={arXiv preprint arXiv:2407.09919},
year={2024}
}
Contact
If you have any questions, please contact csweishang@gmail.com.