Home

Awesome

<p align="center">FlowLens: Seeing Beyond the FoV via Flow-guided Clip-Recurrent Transformer

<br> <div align="center"> <a href="https://www.researchgate.net/profile/Shi-Hao-10" target="_blank">Hao&nbsp;Shi</a> &emsp; <b>&middot;</b> &emsp; <a href="https://www.researchgate.net/profile/Qi-Jiang-63" target="_blank">Qi&nbsp;Jiang</a> &emsp; <b>&middot;</b> &emsp; <a href="https://www.researchgate.net/profile/Kailun-Yang" target="_blank">Kailun&nbsp;Yang</a> &emsp; <b>&middot;</b> &emsp; <a href="https://www.researchgate.net/profile/Yin-Xiaoting" target="_blank">Xiaoting&nbsp;Yin</a> &emsp; <b>&middot;</b> &emsp; <a href="https://www.researchgate.net/profile/Kaiwei-Wang-4" target="_blank">Kaiwei&nbsp;Wang</a> <br> <br> <a href="https://arxiv.org/pdf/2211.11293.pdf" target="_blank">Paper</a>

PWC

</div> <div align=center><img src="assets/flowlens.png" width="800" height="368" /></div>

Update

TODO List

Abstract

Limited by hardware cost and system size, camera's Field-of-View (FoV) is not always satisfactory. However, from a spatio-temporal perspective, information beyond the camera’s physical FoV is off-the-shelf and can actually be obtained ''for free'' from past video streams. In this paper, we propose a novel task termed Beyond-FoV Estimation, aiming to exploit past visual cues and bidirectional break through the physical FoV of a camera. We put forward a FlowLens architecture to expand the FoV by achieving feature propagation explicitly by optical flow and implicitly by a novel clip-recurrent transformer, which has two appealing features: 1) FlowLens comprises a newly proposed Clip-Recurrent Hub with 3D-Decoupled Cross Attention (DDCA) to progressively process global information accumulated in the temporal dimension. 2) A multi-branch Mix Fusion Feed Forward Network (MixF3N) is integrated to enhance the spatially-precise flow of local features. To foster training and evaluation, we establish KITTI360-EX, a dataset for outer- and inner FoV expansion. Extensive experiments on both video inpainting and beyond-FoV estimation tasks show that FlowLens achieves state-of-the-art performance.

Demos

<p align="center"> (Outer Beyond-FoV) </p> <p align="center"> <img width="750" alt="Animation" src="assets/out_beyond.gif"/> </p> <br><br> <p align="center"> (Inner Beyond-FoV) </p> <p align="center"> <img width="750" alt="Animation" src="assets/in_beyond.gif"/> </p> <br><br> <p align="center"> (Object Removal) </p> <p align="center"> <img width="750" alt="Animation" src="assets/breakdance.gif"/> </p> <br><br>

Dependencies

This repo has been tested in the following environment:

torch == 1.10.2
cuda == 11.3
mmflow == 0.5.2

Usage

To train FlowLens(-S), use:

python train.py --config configs/KITTI360EX-I_FlowLens_small_re.json

To eval on KITTI360-EX, run:

python evaluate.py \
--model flowlens \
--cfg_path configs/KITTI360EX-I_FlowLens_small_re.json \
--ckpt release_model/FlowLens-S_re_Out_500000.pth --fov fov5

Turn on --reverse for test time augmentation (TTA).

Trun on --save_results to save your output.

Pretrained Models

The pretrained model can be found there:

https://share.weiyun.com/6G6QEdaa

KITTI360-EX for Beyond-FoV Estimation

The preprocessed KITTI360-EX can be downloaded from here:

https://share.weiyun.com/BReRdDiP

Results

KITTI360EX-InnerSphere

MethodTest LogicTTAPSNRSSIMVFIDRuntime (s/frame)
FlowLens-S (Paper)Beyond-FoVwo36.170.99160.0300.023
FlowLens-S (This Repo)Beyond-FoVwo37.310.99260.0250.015
FlowLens-S+ (This Repo)Beyond-FoVwith38.360.99380.0170.050
FlowLens-S (This Repo)Video Inpaintingwo38.010.99380.0220.042
FlowLens-S+ (This Repo)Video Inpaintingwith38.970.99470.0150.142
MethodTest LogicTTAPSNRSSIMVFIDRuntime (s/frame)
FlowLens (Paper)Beyond-FoVwo36.690.99160.0270.049
FlowLens (This Repo)Beyond-FoVwo37.650.99270.0240.033
FlowLens+ (This Repo)Beyond-FoVwith38.740.99410.0170.095
FlowLens (This Repo)Video Inpaintingwo38.380.99390.0180.086
FlowLens+ (This Repo)Video Inpaintingwith39.400.99500.0150.265

KITTI360EX-OuterPinhole

MethodTest LogicTTAPSNRSSIMVFIDRuntime (s/frame)
FlowLens-S (Paper)Beyond-FoVwo19.680.92470.3000.023
FlowLens-S (This Repo)Beyond-FoVwo20.410.93320.2850.021
FlowLens-S+ (This Repo)Beyond-FoVwith21.300.93970.3020.056
FlowLens-S (This Repo)Video Inpaintingwo21.690.94530.2450.048
FlowLens-S+ (This Repo)Video Inpaintingwith22.400.95030.2710.146
MethodTest LogicTTAPSNRSSIMVFIDRuntime (s/frame)
FlowLens (Paper)Beyond-FoVwo20.130.93140.2810.049
FlowLens (This Repo)Beyond-FoVwo20.850.93810.2590.035
FlowLens+ (This Repo)Beyond-FoVwith21.650.94320.2760.097
FlowLens (This Repo)Video Inpaintingwo22.230.95070.2310.085
FlowLens+ (This Repo)Video Inpaintingwith22.860.95430.2530.260

Note that when using the ''Video Inpainting'' logic for output, the model is allowed to use more reference frames from the future, and each local frame is estimated at least twice, thus higher accuracy can be obtained while result in slower inference speed, and it is not realistic for real-world deployment.

Citation

If you find our paper or repo useful, please consider citing our paper:

@article{shi2022flowlens,
title={FlowLens: Seeing Beyond the FoV via Flow-guided Clip-Recurrent Transformer},
author={Shi, Hao and Jiang, Qi and Yang, Kailun and Yin, Xiaoting and Wang, Kaiwei},
journal={arXiv preprint arXiv:2211.11293},
year={2022}
}

Acknowledgement

This project would not have been possible without the following outstanding repositories:

STTN, MMFlow

Devs

Hao Shi

Contact

Feel free to contact me if you have additional questions or have interests in collaboration. Please drop me an email at haoshi@zju.edu.cn. =)