Awesome
[ICCV2023] Explicit Motion Disentangling for Efficient Optical Flow Estimation
<p align="center"> Changxing Deng<sup>1</sup>, Ao Luo<sup>2</sup>, Haibin Huang<sup>3</sup>, Shaodan Ma<sup>1</sup>, Jiangyu Liu<sup>2</sup>, Shuaicheng Liu<sup>4,2</sup> </p> <p align="center">1. University of Macau, 2. Megvii Technology, 3. Kuaishou Technology, </p> <p align="center">4. University of Electronic Science and Technology of China</p>This repository provides the implementation for Explicit Motion Disentangling for Efficient Optical Flow Estimation
Abstract
In this paper, we propose a novel framework for optical flow estimation that achieves a good balance between performance and efficiency. Our approach involves disentangling global motion learning from local flow estimation, treating global matching and local refinement as separate stages. We offer two key insights: First, the multi-scale 4D cost-volume based recurrent flow decoder is computationally expensive and unnecessary for handling small displacement. With the separation, we can utilize lightweight methods for both parts and maintain similar performance. Second, a dense and robust global matching is essential for both flow initialization as well as stable and fast convergence for the refinement stage. Towards this end, we introduce EMD-Flow, a framework that explicitly separates global motion estimation from the recurrent refinement stage. We propose two novel modules: Multi-scale Motion Aggregation (MMA) and Confidence-induced Flow Propagation (CFP). These modules leverage cross-scale matching prior and self-contained confidence maps to handle the ambiguities of dense matching in a global manner, generating a dense initial flow. Additionally, a lightweight decoding module is followed to handle small displacements, resulting in an efficient yet robust flow estimation framework. We further conduct comprehensive experiments on standard optical flow benchmarks with the proposed framework, and the experimental results demonstrate its superior balance between performance and runtime.
Comparison with state-of-the-art methods on Sintel and KITTI datasets.
Requirements
pytorch==1.10.2
torchvision==0.11.3
numpy==1.19.2
timm==0.4.12
tensorboard==2.6.0
scipy==1.5.2
pillow==8.4.0
opencv-python==4.5.5.64
cudatoolkit==11.3.1
Evaluate
- The weights of models are available on Google Drive, and put the files into the folder weights.
- Download the Sintel and KITTI datasets and put them into the folder data
- Evaluate our models by
sh evaluate.sh
Citation
If you think this work is helpful, please cite
@inproceedings{deng2023explicit,
title={Explicit motion disentangling for efficient optical flow estimation},
author={Deng, Changxing and Luo, Ao and Huang, Haibin and Ma, Shaodan and Liu, Jiangyu and Liu, Shuaicheng},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
pages={9521--9530},
year={2023}
}
Acknowledgement
The main framework is adapted from RAFT, Swin-Transformer and FlowFormer. We thank the authors for the contribution.