Home

Awesome

SVCNet

<br>Official PyTorch Implementation of the SVCNet Paper<br>

Project | arXiv | IEEE Xplore

1 Introduction

SVCNet is an architecture for scribble-based video colorization, which includes two sub-networks: CPNet and SSNet. This repo contains training and evaluation code for the following paper:

SVCNet: Scribble-based Video Colorization Network with Temporal Aggregation<br> Yuzhi Zhao<sup>1</sup>, Lai-Man Po<sup>1</sup>, Kangcheng Liu<sup>2</sup>, Xuehui Wang<sup>3</sup>, Wing-Yin Yu<sup>1</sup>, Pengfei Xian<sup>1</sup>, Yujia Zhang<sup>4</sup>, Mengyang Liu<sup>4</sup><br> <sup>1</sup>City University of Hong Kong, <sup>2</sup>Nanyang Technological University, <sup>3</sup>Shanghai Jiao Tong University, <sup>4</sup>Tencent Video<br> IEEE Transactions on Image Processing (TIP), 2023<br>

pipeline

2 Preparation

2.1 Environment

We test the code on CUDA 10.0 (higher version is also compatible). The basic requirements are as follows:

If you use conda, the following command is helpful:

conda env create -f environment.yaml
conda activate svcnet

2.2 Pre-trained models

We upload the pre-trained SVCNet modules (including CPNet and SSNet) and other public pre-trained models (including PWCNet and VGG-16). By default we put all those files under a trained_models root.

All the pre-trained model files can be downloaded at this link.

Alternatively, you can download following files if you only want to do inference:

2.3 Dataset

We use ImageNet, DAVIS, and Videvo datasets as our training set. Please cite the original papers if you use these datasets. We release zip files that contain those images. By default we put all those files under a data root.

We generate saliency maps as pseudo segmentation labels for images in the ImageNet and Videvo datasets. Note that, images in the DAVIS dataset have segmentation labels. The saliency detection method is Pyramid Feature Attention Network for Saliency detection. The generated saliency maps are also released.

All the ImageNet files can be downloaded at this link. All the DAVIS-Videvo files can be downloaded at this link. Alternatively, you can find each seperate file below:

2.3.1 Training set of ImageNet (256x256 resolution, 1281167 files)

2.3.2 Validation set of ImageNet (256x256 resolution, 50000 files)

2.3.3 Training set of DAVIS-Videvo dataset (156 video clips)

2.3.4 Validation set of DAVIS-Videvo dataset (50 video clips)

3 Arrangement

4 Fast inference

4.1 Demo

We include a legacy video segment along with their corresponding color scribble frames with 4 different styles. The input grayscale frames and color scribbles are also included. You may find the code related to how to generate these color scribbles in GCS sub-folder. Users can easily reproduce the following results by running:

cd SSNet
python test.py

gif gif

gif gif

4.2 Test on user data

5 Visualization

A few video samples on the validation dataset are illustrated below:

gif gif gif gif

6 Acknowledgement

Some codes are borrowed from the PyTorch-PFAN, SCGAN, VCGAN, PyTorch-PWC, and DEVC projects. Thanks for their awesome works.

7 Citation

If you think this work is helpful, please consider cite:

@article{zhao2023svcnet,
  title={SVCNet: Scribble-based Video Colorization Network with Temporal Aggregation},
  author={Zhao, Yuzhi and Po, Lai-Man and Liu, Kangcheng and Wang, Xuehui and Yu, Wing-Yin and Xian, Pengfei and Zhang, Yujia and Liu, Mengyang},
  journal={IEEE Transactions on Image Processing},
  volume={32},
  pages={4443-4458},
  year={2023}
}