Awesome

PVDNet: Recurrent Video Deblurring with Blur-Invariant Motion Estimation and Pixel Volumes Official PyTorch Implementation of the TOG 2021 Paper Paper | arXiv | Supp

This repo contains training and evaluation code for the following paper:

Recurrent Video Deblurring with Blur-Invariant Motion Estimation and Pixel Volumes Hyeongseok Son, Junyong Lee, Jonghyeop Lee, Sunghyun Cho, and Seungyong Lee POSTECH ACM Transactions on Graphics (TOG) 2021 (presented at SIGGRAPH 2021)

About the Research

<details> <summary>Click here</summary> <h3> Overall Framework </h3> <img width=50% src="./assets/framework.jpg" /> <img src="./assets/network.jpg" /> Our video deblurring framework consists of three modules: a blur-invariant motion estimation network (BIMNet), a pixel volume generator, and a pixel volume-based deblurring network (PVDNet). We first train BIMNet; after it has converged, we combine the two networks with the pixel volume generator. We then fix the parameters of BIMNet and train PVDNet by training the entire network. <h3> Blur-Invariant Motion Estimation Network (BIMNet)</h3> To estimate motion between frames accurately, we adopt <a "https://arxiv.org/pdf/1805.07036.pdf">LiteFlowNet</a> and train it with a blur-invariant loss so that the trained network can estimate blur-invariant optical flow between frames. We train BIMNet with a blur-invariant loss <img src="https://latex.codecogs.com/svg.latex?L_{BIM}^{\alpha\beta}" />, which is defined as (refer Eq. 1 in the main paper): <img src="./assets/BIMNet_eq.svg" /> <img width=80% src="./assets/BIMNet_figure.jpg" /> The figure shows a qualitative comparison of different optical flow methods. The results of the other methods contain severely distorted structures due to errors in their optical flow maps. In contrast, the results of BIMNets show much less distortions. <h3> Pixel Volume for Motion Compensation </h3> We propose a novel pixel volume that provides multiple candidates for matching pixels between images. Moreover, a pixel volume provides an additional cue for motion compensation based on the majority. <img width=60% src="./assets/PV.jpg" /> Our pixel volume approach leads to the performance improvement of video deblurring by utilizing the multiple candidates in a pixel volume in two aspects: 1) in most cases, the majority cue for the correct match would help as the statistics (Sec. 4.4 in the main paper) shows, and 2) in other cases, PVDNet would exploit multiple candidates to estimate the correct match referring to nearby pixels with majority cues. </details>

Getting Started

Prerequisites

Tested environment

1. Environment setup

$ git clone https://github.com/codeslake/PVDNet.git
$ cd PVDNet

$ conda create -y --name PVDNet python=3.8 && conda activate PVDNet

# Install Pytorch (1.8.1 for example,)
$ conda install pytorch==1.8.0 torchvision==0.9.0 torchaudio==0.8.0 -c pytorch

# Install required dependencies (one of below depend on CUDA version)
# for CUDA10.2
$ sh install_CUDA10.2.sh
# for CUDA11.1
$ sh install_CUDA11.1.sh
# for CUDA11.3
$ sh install_CUDA11.3.sh

2. Datasets

Download and unzip datasets under [DATASET_ROOT]:

Su et al.'s dataset: OneDrive | Dropbox
Nah et al.'s dataset: OneDrive | Dropbox

[DATASET_ROOT]
    ├── train_DVD
    ├── test_DVD
    ├── train_nah
    └── test_nah

[DATASET_ROOT] can be modified with config.data_offset in ./configs/config.py.

3. Pre-trained models

Download and unzip pretrained weights (OneDrive | Dropbox) under ./ckpt/:

.
├── ...
├── ./ckpt
│   ├── BIMNet.pytorch
│   ├── PVDNet_DVD.pytorch
│   ├── PVDNet_nah.pytorch
│   └── PVDNet_large_nah.pytorch
└── ...

Testing models of TOG 2021

For PSNRs and SSIMs reported in the paper, we use the approach of Koehler et al. following Su et al., that first aligns two images using global translation to represent the ambiguity in the pixel location caused by blur. Refer here for the evaluation code.

## Table 4 in the main paper (Evaluation on Su etal's dataset)
# Our final model 
CUDA_VISIBLE_DEVICES=0 python run.py --mode PVDNet_DVD --config config_PVDNet --data DVD --ckpt_abs_name ckpt/PVDNet_DVD.pytorch

## Table 5 in the main paper (Evaluation on Nah etal's dataset)
# Our final model 
CUDA_VISIBLE_DEVICES=0 python run.py --mode PVDNet_nah --config config_PVDNet --data nah --ckpt_abs_name ckpt/PVDNet_nah.pytorch

# Larger model
CUDA_VISIBLE_DEVICES=0 python run.py --mode PVDNet_large_nah --config config_PVDNet_large --data nah --ckpt_abs_name ckpt/PVDNet_large_nah.pytorch

Testing results will be saved in [LOG_ROOT]/PVDNet_TOG2021/[mode]/result/quanti_quali/[mode]_[epoch]/[data]/.

[LOG_ROOT] can be modified with config.log_offset in ./configs/config.py.

options

--data: The name of a dataset to evaluate: DVD | nah | random. Default: DVD
- The data structure can be modified in the function set_eval_path(..) in ./configs/config.py.
- random is for testing models with any video frames, which should be placed as [DATASET_ROOT]/random/[video_name]/*.[jpg|png].

Wiki

Contact

Open an issue for any inquiries. You may also have contact with sonhs@postech.ac.kr or junyonglee@postech.ac.kr

License

This software is being made available under the terms in the LICENSE file. Any exemptions to these terms require a license from the Pohang University of Science and Technology.

Citation

If you find this code useful, please consider citing:

@Article{Son2021PVDNet,
    author  = {Hyeongseok Son and Junyong Lee and Jonghyeop Lee and Sunghyun Cho and Seungyong Lee},
    title   = {Recurrent Video Deblurring with Blur-Invariant Motion Estimation and Pixel Volumes},
    journal = {ACM Transactions on Graphics (TOG)},
    volume  = {40},
    number  = {5},
    year    = {2021}
}