Awesome

Video Object Segmentation-aware Video Frame Interpolation (ICCV 2023)

This is official pytorch implementation of the paper "Video Object Segmentation-aware Video Frame Interpolation" in ICCV 2023

Abstract

Video frame interpolation (VFI) is a very active research topic due to its broad applicability to many applications, including video enhancement, video encoding, and slow-motion effects. VFI methods have been advanced by improving the overall image quality for challenging sequences containing occlusions, large motion, and dynamic texture. This mainstream research direction neglects that foreground and background regions have different importance in perceptual image quality. Moreover, accurate synthesis of moving objects can be of utmost importance in computer vision applications. In this paper, we propose a video object segmentation (VOS)-aware training framework called VOS-VFI that allows VFI models to interpolate frames with more precise object boundaries. Specifically, we exploit VOS as an auxiliary task to help train VFI models by providing additional loss functions, including segmentation loss and bi-directional consistency loss. From extensive experiments, we demonstrate that VOS-VFI can boost the performance of existing VFI models by rendering clear object boundaries. Moreover, VOS-VFI displays its effectiveness on multiple benchmarks for different applications, including video object segmentation, object pose estimation, and visual tracking.

Enviornments

PyTorch
CUDA 11+
cupy-cuda
python 3.8
torchvision
pyiqa (for metrics)

A more organized version will be updated later.

Train

Prepare training data

Download Vimeo90k trainind data from vimeo triplet dataset.
Using Vimeo90k dataset & precompute_mask.py, generate Object Segmentation Mask.
Download STCN pretrained model in saves and AdaCoF-VOS our pretrain model.
Download test_video_davis. DAVIS 2016 dataset for VOS (We used 480p JPEG images).

VOS-VFI

cupy_module
losses
model
models
saves ( for STCN, VOS model's pre-trained model )
test_input ( for VFI test )
test_video_davis ( for VOS test)
util
ada-VOS_pretrained.pth
dataloader_seg.py
evaluation.py
...etc .py

Begin to train

Run train.py with following command. python train.py --train [dir_to_vimeo_triplet] --out_dir [dir_to_output_folder]
You might have to change many other options (epochs, learning rate, hyper parameters, etc.)

Test

Evaluation

The Evaluation part is same as existing VFI models.
For evaluation, you need the checkpoint file.

Run evaluation.py with following command.

   python evaluation.py --out_dir [output_dir] --checkpoint [checkpoint_dir] --config [configuration_dir]

Video Interpolation

To interpolate and evaluate video datasets, Run interpolate_video_folder+accuracy_example.py
This example is the test code on video object segmentation dataset (DAVIS2016) for x2 odd frames.

Two-frame interpolation

To interpolate a frame between arbitary two fraems you have, run interpolte_twoframe.py with following command.

python interpolate_twoframe.py --first_frame [first_frame] --second_frame [second_frame] --output_frame [output_frame] --checkpoint [checkpoint_dir] --config [configuration_dir]

Citation

@InProceedings{Yoo_2023_ICCV,
author    = {Yoo, Jun-Sang and Lee, Hongjae and Jung, Seung-Won},
title     = {Video Object Segmentation-aware Video Frame Interpolation},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
month     = {October},
year      = {2023},
pages     = {12322-12333}
}

Acknowledgements

This code is based on HyeongminLee/AdaCoF-pytorch Thanks to off-the-shelf VOS model, STCN