Awesome

FlowLens: Seeing Beyond the FoV via Flow-guided Clip-Recurrent Transformer

<div align="center"> <a href="https://www.researchgate.net/profile/Shi-Hao-10" target="_blank">Hao Shi</a> &emsp; · &emsp; <a href="https://www.researchgate.net/profile/Qi-Jiang-63" target="_blank">Qi Jiang</a> &emsp; · &emsp; <a href="https://www.researchgate.net/profile/Kailun-Yang" target="_blank">Kailun Yang</a> &emsp; · &emsp; <a href="https://www.researchgate.net/profile/Yin-Xiaoting" target="_blank">Xiaoting Yin</a> &emsp; · &emsp; <a href="https://www.researchgate.net/profile/Kaiwei-Wang-4" target="_blank">Kaiwei Wang</a> <a href="https://arxiv.org/pdf/2211.11293.pdf" target="_blank">Paper</a>

</div> <div align=center><img src="assets/flowlens.png" width="800" height="368" /></div>

Update

2022.11.19 Init repository.
2022.11.21 Release the arXiv version with supplementary materials.
2023.04.04 :fire: Our code is publicly available.
2023.04.04 :fire: Release pretrained models.
2023.04.04 :fire: Release KITTI360-EX dataset.

TODO List

Code release.
KITTI360-EX release.
Towards higher performance with extra small costs.

Abstract

Limited by hardware cost and system size, camera's Field-of-View (FoV) is not always satisfactory. However, from a spatio-temporal perspective, information beyond the camera’s physical FoV is off-the-shelf and can actually be obtained ''for free'' from past video streams. In this paper, we propose a novel task termed Beyond-FoV Estimation, aiming to exploit past visual cues and bidirectional break through the physical FoV of a camera. We put forward a FlowLens architecture to expand the FoV by achieving feature propagation explicitly by optical flow and implicitly by a novel clip-recurrent transformer, which has two appealing features: 1) FlowLens comprises a newly proposed Clip-Recurrent Hub with 3D-Decoupled Cross Attention (DDCA) to progressively process global information accumulated in the temporal dimension. 2) A multi-branch Mix Fusion Feed Forward Network (MixF3N) is integrated to enhance the spatially-precise flow of local features. To foster training and evaluation, we establish KITTI360-EX, a dataset for outer- and inner FoV expansion. Extensive experiments on both video inpainting and beyond-FoV estimation tasks show that FlowLens achieves state-of-the-art performance.

Demos

(Outer Beyond-FoV) <img width="750" alt="Animation" src="assets/out_beyond.gif"/> (Inner Beyond-FoV) <img width="750" alt="Animation" src="assets/in_beyond.gif"/> (Object Removal) <img width="750" alt="Animation" src="assets/breakdance.gif"/>

Dependencies

This repo has been tested in the following environment:

torch == 1.10.2
cuda == 11.3
mmflow == 0.5.2

Usage

To train FlowLens(-S), use:

python train.py --config configs/KITTI360EX-I_FlowLens_small_re.json

To eval on KITTI360-EX, run:

python evaluate.py \
--model flowlens \
--cfg_path configs/KITTI360EX-I_FlowLens_small_re.json \
--ckpt release_model/FlowLens-S_re_Out_500000.pth --fov fov5

Turn on --reverse for test time augmentation (TTA).

Trun on --save_results to save your output.

Pretrained Models

The pretrained model can be found there:

https://share.weiyun.com/6G6QEdaa

KITTI360-EX for Beyond-FoV Estimation

The preprocessed KITTI360-EX can be downloaded from here:

https://share.weiyun.com/BReRdDiP

Results

KITTI360EX-InnerSphere

Method	Test Logic	TTA	PSNR	SSIM	VFID	Runtime (s/frame)
FlowLens-S (Paper)	Beyond-FoV	wo	36.17	0.9916	0.030	0.023
FlowLens-S (This Repo)	Beyond-FoV	wo	37.31	0.9926	0.025	0.015
FlowLens-S+ (This Repo)	Beyond-FoV	with	38.36	0.9938	0.017	0.050
FlowLens-S (This Repo)	Video Inpainting	wo	38.01	0.9938	0.022	0.042
FlowLens-S+ (This Repo)	Video Inpainting	with	38.97	0.9947	0.015	0.142

Method	Test Logic	TTA	PSNR	SSIM	VFID	Runtime (s/frame)
FlowLens (Paper)	Beyond-FoV	wo	36.69	0.9916	0.027	0.049
FlowLens (This Repo)	Beyond-FoV	wo	37.65	0.9927	0.024	0.033
FlowLens+ (This Repo)	Beyond-FoV	with	38.74	0.9941	0.017	0.095
FlowLens (This Repo)	Video Inpainting	wo	38.38	0.9939	0.018	0.086
FlowLens+ (This Repo)	Video Inpainting	with	39.40	0.9950	0.015	0.265

KITTI360EX-OuterPinhole

Method	Test Logic	TTA	PSNR	SSIM	VFID	Runtime (s/frame)
FlowLens-S (Paper)	Beyond-FoV	wo	19.68	0.9247	0.300	0.023
FlowLens-S (This Repo)	Beyond-FoV	wo	20.41	0.9332	0.285	0.021
FlowLens-S+ (This Repo)	Beyond-FoV	with	21.30	0.9397	0.302	0.056
FlowLens-S (This Repo)	Video Inpainting	wo	21.69	0.9453	0.245	0.048
FlowLens-S+ (This Repo)	Video Inpainting	with	22.40	0.9503	0.271	0.146

Method	Test Logic	TTA	PSNR	SSIM	VFID	Runtime (s/frame)
FlowLens (Paper)	Beyond-FoV	wo	20.13	0.9314	0.281	0.049
FlowLens (This Repo)	Beyond-FoV	wo	20.85	0.9381	0.259	0.035
FlowLens+ (This Repo)	Beyond-FoV	with	21.65	0.9432	0.276	0.097
FlowLens (This Repo)	Video Inpainting	wo	22.23	0.9507	0.231	0.085
FlowLens+ (This Repo)	Video Inpainting	with	22.86	0.9543	0.253	0.260

Note that when using the ''Video Inpainting'' logic for output, the model is allowed to use more reference frames from the future, and each local frame is estimated at least twice, thus higher accuracy can be obtained while result in slower inference speed, and it is not realistic for real-world deployment.

Citation

If you find our paper or repo useful, please consider citing our paper:

@article{shi2022flowlens,
title={FlowLens: Seeing Beyond the FoV via Flow-guided Clip-Recurrent Transformer},
author={Shi, Hao and Jiang, Qi and Yang, Kailun and Yin, Xiaoting and Wang, Kaiwei},
journal={arXiv preprint arXiv:2211.11293},
year={2022}
}

Acknowledgement

This project would not have been possible without the following outstanding repositories:

STTN, MMFlow

Devs

Hao Shi

Contact

Feel free to contact me if you have additional questions or have interests in collaboration. Please drop me an email at haoshi@zju.edu.cn. =)