Home

Awesome

CDFI (Compression-Driven-Frame-Interpolation)

[Paper] [arXiv]

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021

News

An expanded version of the conference paper is released at arXiv, which reveals technical details regarding model compression based on layer-wise sparsity information obtained via optimization. That part of code is available at the sub-directory extended_version.

Introduction

We propose a Compression-Driven network design for Frame Interpolation (CDFI), that leverages model compression to significantly reduce the model size (allows a better understanding of the current architecture) while making room for further improvements and achieving superior performance in the end. Concretely, we first compress AdaCoF and show that a 10X compressed AdaCoF performs similarly as its original counterpart; then we improve upon this compressed model with simple modifications. Note that typically it is prohibitive to implement the same improvements on the original heavy model.

<p align="center"> <img src="imgs/cdfi_fps_160.gif" /> </p>

The above GIF is a demo of using our method to generate slow motion video, which increases the FPS from 5 to 160. We also provide a long video demonstration here (redirect to YouTube).

Environment

Installation

conda create -n cdfi python==3.8.3
conda activate cdfi
conda install -c conda-forge cupy==7.7.0
pip install torch==1.8.1+cu111 torchvision -f https://download.pytorch.org/whl/lts/1.8/torch_lts.html
pip install opencv-python lpips
conda install matplotlib scikit-image

Test Pre-trained Models

Download repository:

$ git clone https://github.com/tding1/CDFI.git
$ cd CDFI/

Testing data

For user convenience, we already provide the Middlebury and UCF101-DVF test datasets in our repository, which can be found under directory test_data/.

Evaluation metrics

We use the built-in functions in skimage.metrics to compute the PSNR and SSIM, for which the higher the better. We also use LPIPS to measure perceptual similarity, for which the smaller the better.

Note: We are using squeeze net in calculating LPIPS, while other work (Softsplat, EDSC, etc) might use different methods in their original implementations, e.g., alex net. Although we manually test AdaCoF, EDSC, CAIN under the same setting and demonstrate the results in the paper, there may be discrepancies from their original results, see also the discussion here.

Test our pre-trained CDFI model

$ python mytest.py --gpu_id 0

By default, it will load our pre-trained model checkpoints/CDFI_adacof.pth. It will print the quantitative results on both Middlebury and UCF101-DVF, and the interpolated images will be saved under test_output/cdfi_adacof/.

Test the compressed AdaCoF

$ python mytest.py --gpu_id 0 --model compressed_adacof --kernel_size 5 --dilation 1

By default, it will load the compressed AdaCoF model checkpoints/compressed_adacof_F_5_D_1.pth. It will print the quantitative results on both Middlebury and UCF101-DVF, and the interpolated images will be saved under test_output/compressed_adacof_F_5_D_1/.

Test the compressed AdaCoF+

$ python mytest.py --gpu_id 0 --model compressed_adacof --kernel_size 11 --dilation 2

By default, it will load the compressed AdaCoF+ model checkpoints/compressed_adacof_F_11_D_2.pth. It will print the quantitative results on both Middlebury and UCF101-DVF, and the interpolated images will be saved under test_output/compressed_adacof_F_11_D_2/.

Interpolate two frames

$ python interpolate_twoframe.py --gpu_id 0 --first_frame imgs/0.png --second_frame imgs/1.png --output_frame ./output.png

By default, it will load our pre-trained model checkpoints/CDFI_adacof.pth, and generate the intermediate frame output.png given two consecutive frames in a sequence.

Interpolate video

$ python interpolate_video.py --gpu_id 0 --input_video imgs/img_seq/ --output_video ./interpolated_video

This script will interpolate a video sequence using our pre-trained model checkpoints/CDFI_adacof.pth, thus increasing the FPS by a factor of 2. You may want to repeat the procedure on the interpolated video if a higher FPS is desired.

Train Our Model

Training data

We use the Vimeo-90K triplet dataset for video frame interpolation task, which is relatively large (>32 GB).

$ wget http://data.csail.mit.edu/tofu/dataset/vimeo_triplet.zip
$ unzip vimeo_triplet.zip
$ rm vimeo_triplet.zip

Start training

$ python train.py --gpu_id 0 --data_dir path/to/vimeo_triplet/ --batch_size 8

It will generate an unique ID for each training, and all the intermediate results/records will be saved under model_weights/<training id>/ (or you can specify your experiment ID using --uid). For a GPU device with memory around 10GB, the --batch_size can take a value as large as 3, otherwise CUDA may be out of memory. There are many other training options, e.g., --lr, --epochs, --loss and so on, can be found in train.py.

Apply CDFI to New Models

One nice thing about CDFI is that the framework can be easily applied to other (heavy) DNN models and potentially boost their performance. The key to CDFI is the optimization-based compression that compresses a model via fine-grained pruning. In particular, we use the efficient and easy-to-use sparsity-inducing optimizer OBPROXSG (see also paper), and summarize the compression procedure for any other model in the following. For details, we recommend checking our long version of the paper at arXiv and the additional code at extended_version.

Now it's ready to make further improvements/modifications on the compressed model, based on the understanding of its flaws/drawbacks.

Citation

@article{ding2021cdfi,
  title={CDFI: Compression-Driven Network Design for Frame Interpolation},
  author={Ding, Tianyu and Liang, Luming and Zhu, Zhihui and Zharkov, Ilya},
  journal={arXiv preprint arXiv:2103.10559},
  year={2021}
}

or

@inproceedings{ding2021cdfi,
  title={CDFI: Compression-Driven Network Design for Frame Interpolation},
  author={Ding, Tianyu and Liang, Luming and Zhu, Zhihui and Zharkov, Ilya},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={8001--8011},
  year={2021}
}

Acknowledgements

The code is largely based on HyeongminLEE/AdaCoF-pytorch and baowenbo/DAIN.