

Sparse Global Matching for Video Frame Interpolation with Large Motion

<p align="center"> <a href="https://scholar.google.com.hk/citations?hl=zh-CN&view_op=list_works&gmla=AKKJWFe0ZBvfA_4yxMRe8BW79xNafjCwXtxN10finOaqV1EREnZGxSX6DbpZelBUJD0GZmp5S7unCf76xrgOfnS6SVA&user=dvUKnKEAAAAJ" target='_blank'>Chunxu Liu*</a>,&nbsp; <a href="https://scholar.google.com.hk/citations?user=48vfuRAAAAAJ&hl=en" target='_blank'>Guozhen Zhang*</a>,&nbsp; <a href="https://scholar.google.com/citations?user=1c9oQNMAAAAJ&hl=en" target='_blank'>Rui Zhao</a>,&nbsp; <a href="https://scholar.google.com.hk/citations?user=HEuN8PcAAAAJ&hl=en" target='_blank'>Limin Wang</a>,&nbsp; <br> Nanjing University, &nbsp; SenseTime Research </p> <p align="center"> <a href="http://arxiv.org/abs/2404.06913" target='_blank'> <img src="https://img.shields.io/badge/Paper-πŸ“•-red"> </a> <a href="https://sgm-vfi.github.io/" target='_blank'> <img src="https://img.shields.io/badge/Project Page-πŸ”—-blue"> </a> </p> <p style="font-size:30px;"> <b>TL;DR: </b>We introduce <b>Sparse Global Matching Pipeline</b> for Video Frame Interpolation task: </p> <p style="font-size:25px;"> 0. Estimate intermediate initial flows with local information. <br> 1. Identify flaws in the initial flows.<br> 2. Estimate flow compensation by <b>Sparese Global Matching</b>. <br> 3. Merge the flow compensation with the initial flows. <br> 4. Compute the intermediate frame using the flows from 3. and keep refining. </p> <div align="center"> <img src="figs/pipeline.png" width="1200"/> </div>

To evaluate the effectiveness of our method in handling large motion, we carefully curate a more challenging subset from commonly used benchmarks. Experiments shows that our work can bring improvements when dealing with challenging large motion benchmarks.

<div align="center"> <img src="figs/demo.gif" width="500"/> </div>

Training Dataset Preparation

We need X4K1000FPS for our sparse global matching branch fine-tuning, and Vimeo90K for our local branch training. After downloading and processing the datasets, you can place them in the following folder structure:

β”œβ”€β”€ ...
└── datasets
    β”œβ”€β”€ X4K1000FPS
    β”‚   β”œβ”€β”€ train
    β”‚   β”œβ”€β”€ val
    β”‚   └── test
    └── vimeo_triplet (needed if train local branch)
        β”œβ”€β”€ ...
        β”œβ”€β”€ tri_trainlist.txt
        └── sequences

Environment Setup

conda create -n sgm-vfi python=3.8 
pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 --extra-index-url https://download.pytorch.org/whl/cu117
pip install -r requirements.txt

Sparse Global Matching Fine-tuning

Pretrained File Preparation

We provide the pretrained local branch model for a quicker launch of sparse global matching. You can download the pretrained model here and place it in [project_folder]/log/ours-local/ckpt/ours-local.pth.

Furthermore, for the global feature extractor GMFlow, you can download the pretrained model in here, then unzip it and place gmflow_sintel-0c07dcb3.pth in [project_folder]/pretrained/gmflow_sintel-0c07dcb3.pth.

Finally, for fine-tuning sparse global matching branch, the file folder should look like this.


After the preparation, you can modify and check the settings in config.py, the default setting is for ours-small-1/2 fine-tuning.

Finally, you can start the fine-tuning with the following command:

torchrun --nproc_per_node=4 train_x4k.py --batch_size 8 --need_patch --train_data_path path/to/X4K/train --val_data_path path/to/X4K/val

<span id="jump"> Top Related Files </span>

β”œβ”€β”€ train_x4k.py
β”œβ”€β”€ Trainer_x4k.py
β”œβ”€β”€ dataset_x4k.py
β”œβ”€β”€ config.py
β”œβ”€β”€ pretrained
β”‚   └── gmflow_sintel-0c07dcb3.pth
β”œβ”€β”€ log
β”‚   └── ours-local
β”‚       └── ckpt
β”‚           └── ours-local.pth
└── model
    β”œβ”€β”€ __init__.py
    β”œβ”€β”€ flow_estimation_global.py
    β”œβ”€β”€ matching.py
    β”œβ”€β”€ gmflow.py
    └── ...

Local Branch Training

We also provide scripts for training the local branch. After preparing the Vimeo90K dataset, and check the settings in config_base.py (default setting is for ours-local-branch model training), you can start the training process by the following command:

torchrun --nproc_per_node=4 train_base.py --batch_size 8 --data_path ./vimeo_triplet 

Top Related Files

β”œβ”€β”€ train_base.py
β”œβ”€β”€ Trainer_base.py
β”œβ”€β”€ dataset.py
β”œβ”€β”€ config_base.py
└── model
    β”œβ”€β”€ __init__.py
    β”œβ”€β”€ flow_estimation_local.py
    └── ...


In our paper, we analyzed the mean motion magnitude and motion sufficiency (the minimum of the top 5% of each pixel’s flow magnitude) in the most frequently used large motion benchmarks.

<div align="center"> <img src="figs/chart.png" width="600"/> </div>

Evaluation Datasets Preparation

As a result, we curated the most challenging half from Xiph and SNU-FILM hard and extreme with the help of raft-sintel.pth checkpoint provided in RAFT.

The resulting benchmark is available here. You can put top-half-motion-sufficiency_test-hard.txt, top-half-motion-sufficiency_test-extreme.txt in SNU-FILM dataset folder and top-half-motion-sufficiency-gap2.txt in Xiph dataset folder.

Model Checkpoint Preparation

We provide the checkpoints here for evaluation. Please download and place them in the following folder structure:

β”œβ”€β”€ ...
└── log
    └── ours-1-2-points
        └── ckpt
            └── ours-1-2-points.pth

We provide the evaluation script of ours-1-2-points as follows:


python benchmark/XTest_interval.py --path path/to/XTest/test --exp_name ours-1-2-points --num_key_points 0.5


python benchmark/SNU_FILM.py --path ./data/SNU-FILM --exp_name ours-1-2-points --num_key_points 0.5

(Suggestion: You can use ln -s path/to/SNUFILM (project folder)/data/SNU-FILM to avoid extra processing on the input path name)


python benchmark/Xiph.py --path ./xiph --exp_name ours-1-2-points --num_key_points 0.5

(Suggestion: You can use ln -s path/to/Xiph (project folder)/xiph to avoid extra processing on the input path name)

Simple Inference

You can try out our simple 2x inference demo with the following command:

python demo_2x.py 

(Need to prepare the model checkpoint in log/ours-1-2-points/ckpt/ours-1-2-points.pth and the GMFlow pretrained model in pretrained/gmflow_sintel-0c07dcb3.pth)


If you think this project is helpful in your research or for application, please feel free to leave a star⭐️ and cite our paper:

    author    = {Liu, Chunxu and Zhang, Guozhen and Zhao, Rui and Wang, Limin},
    title     = {Sparse Global Matching for Video Frame Interpolation with Large Motion},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2024},
    pages     = {19125-19134}

License and Acknowledgement

This project is released under the Apache 2.0 license. The codes are based on GMFlow, RAFT, EMA-VFI, RIFE, IFRNet. Please also follow their licenses. Thanks for their awesome works!