Awesome
SVCNet
<br>Official PyTorch Implementation of the SVCNet Paper<br>
Project | arXiv | IEEE Xplore
1 Introduction
SVCNet is an architecture for scribble-based video colorization, which includes two sub-networks: CPNet and SSNet. This repo contains training and evaluation code for the following paper:
SVCNet: Scribble-based Video Colorization Network with Temporal Aggregation<br> Yuzhi Zhao<sup>1</sup>, Lai-Man Po<sup>1</sup>, Kangcheng Liu<sup>2</sup>, Xuehui Wang<sup>3</sup>, Wing-Yin Yu<sup>1</sup>, Pengfei Xian<sup>1</sup>, Yujia Zhang<sup>4</sup>, Mengyang Liu<sup>4</sup><br> <sup>1</sup>City University of Hong Kong, <sup>2</sup>Nanyang Technological University, <sup>3</sup>Shanghai Jiao Tong University, <sup>4</sup>Tencent Video<br> IEEE Transactions on Image Processing (TIP), 2023<br>
2 Preparation
2.1 Environment
We test the code on CUDA 10.0 (higher version is also compatible). The basic requirements are as follows:
- pytorch==1.2.0
- torchvision==0.4.0
- cupy-cuda100
- python-opencv
- scipy
- scikit-image
If you use conda, the following command is helpful:
conda env create -f environment.yaml
conda activate svcnet
2.2 Pre-trained models
We upload the pre-trained SVCNet modules (including CPNet and SSNet) and other public pre-trained models (including PWCNet and VGG-16). By default we put all those files under a trained_models root.
All the pre-trained model files can be downloaded at this link.
Alternatively, you can download following files if you only want to do inference:
2.3 Dataset
We use ImageNet, DAVIS, and Videvo datasets as our training set. Please cite the original papers if you use these datasets. We release zip files that contain those images. By default we put all those files under a data root.
We generate saliency maps as pseudo segmentation labels for images in the ImageNet and Videvo datasets. Note that, images in the DAVIS dataset have segmentation labels. The saliency detection method is Pyramid Feature Attention Network for Saliency detection. The generated saliency maps are also released.
All the ImageNet files can be downloaded at this link. All the DAVIS-Videvo files can be downloaded at this link. Alternatively, you can find each seperate file below:
2.3.1 Training set of ImageNet (256x256 resolution, 1281167 files)
2.3.2 Validation set of ImageNet (256x256 resolution, 50000 files)
2.3.3 Training set of DAVIS-Videvo dataset (156 video clips)
2.3.4 Validation set of DAVIS-Videvo dataset (50 video clips)
3 Arrangement
-
CPNet: includes scripts and codes for training and validating CPNet
-
SSNet: includes scripts and codes for training SSNet and validating SVCNet
-
Evaluation: includes codes for evaluation (e.g., Tables II, IV, and V in the paper)
-
GCS: includes codes for generating validation color scribbles
4 Fast inference
4.1 Demo
We include a legacy video segment along with their corresponding color scribble frames with 4 different styles. The input grayscale frames and color scribbles are also included. You may find the code related to how to generate these color scribbles in GCS sub-folder. Users can easily reproduce the following results by running:
cd SSNet
python test.py
4.2 Test on user data
-
Creating your own scribbles (see GCS sub-folder). You need first provide the first color scribble; then, you can use generate_color_scribbles_video.py script to obtain the following scribbles based on the optical flows of your own grayscale video.
-
Inference with your generated scribbles (see SSNet sub-folder). Please follow the guide in the README file, e.g., running test.py.
5 Visualization
A few video samples on the validation dataset are illustrated below:
6 Acknowledgement
Some codes are borrowed from the PyTorch-PFAN, SCGAN, VCGAN, PyTorch-PWC, and DEVC projects. Thanks for their awesome works.
7 Citation
If you think this work is helpful, please consider cite:
@article{zhao2023svcnet,
title={SVCNet: Scribble-based Video Colorization Network with Temporal Aggregation},
author={Zhao, Yuzhi and Po, Lai-Man and Liu, Kangcheng and Wang, Xuehui and Yu, Wing-Yin and Xian, Pengfei and Zhang, Yujia and Liu, Mengyang},
journal={IEEE Transactions on Image Processing},
volume={32},
pages={4443-4458},
year={2023}
}