Awesome

VideoFlow: Exploiting Temporal Cues for Multi-frame Optical Flow Estimation

VideoFlow: Exploiting Temporal Cues for Multi-frame Optical Flow Estimation
Xiaoyu Shi, Zhaoyang Huang, Weikang Bian, Dasong Li, Manyuan Zhang, Ka Chun Cheung, Simon See, Hongwei Qin, Jifeng Dai, Hongsheng Li
ICCV 2023

https://github.com/XiaoyuShi97/VideoFlow/assets/25840016/8121acc6-b874-411e-86de-df55f7d386a9

Requirements

conda create --name videoflow
conda activate videoflow
conda install pytorch=1.6.0 torchvision=0.7.0 cudatoolkit=10.1 matplotlib tensorboard scipy opencv-python -c pytorch
pip install yacs loguru einops timm==0.4.12 imageio

Models

We provide pretrained models. The default path of the models for evaluation is:

├── VideoFlow_ckpt
    ├── MOF_sintel.pth
    ├── BOF_sintel.pth
    ├── MOF_things.pth
    ├── BOF_things.pth
    ├── MOF_kitti.pth
    ├── BOF_kitti.pth

Inference & Visualization

Download VideoFlow_ckpt and put it in the root dir. Run the following command:

python -u inference.py --mode MOF --seq_dir demo_input_images --vis_dir demo_flow_vis

If your input only contain three frames, we recommend to use the BOF model:

python -u inference.py --mode BOF --seq_dir demo_input_images_three_frames --vis_dir demo_flow_vis_three_frames

Data Preparation

To evaluate/train VideoFlow, you will need to download the required datasets.

FlyingChairs
FlyingThings3D
Sintel
KITTI (multi-view extension, 20 frames per scene, 14 GB)
HD1K

By default datasets.py will search for the datasets in these locations. You can create symbolic links to wherever the datasets were downloaded in the datasets folder

├── datasets
    ├── Sintel
        ├── test
        ├── training
    ├── KITTI
        ├── testing
        ├── training
        ├── devkit
    ├── FlyingChairs_release
        ├── data
    ├── FlyingThings3D
        ├── frames_cleanpass
        ├── frames_finalpass
        ├── optical_flow

Training

The script will load the config according to the training stage. The trained model will be saved in a directory in logs and checkpoints. For example, the following script will load the config configs/***.py. The trained model will be saved as logs/xxxx/final.

# Train MOF model
python -u train_MOFNet.py --name MOF-things --stage things --validation sintel
python -u train_MOFNet.py --name MOF-sintel --stage sintel --validation sintel
python -u train_MOFNet.py --name MOF-kitti --stage kitti --validation sintel

# Train BOF model
python -u train_BOFNet.py --name BOF-things --stage things --validation sintel
python -u train_BOFNet.py --name BOF-sintel --stage sintel --validation sintel
python -u train_BOFNet.py --name BOF-kitti --stage kitti --validation sintel

Evaluation

The script will load the config configs/multiframes_sintel_submission.py or configs/sintel_submission.py. Please change the _CN.model in the config file to load corresponding checkpoints.

# Evaluate MOF_things.pth after C stage
python -u evaluate_MOFNet.py --dataset=sintel
python -u evaluate_MOFNet.py --dataset=things
python -u evaluate_MOFNet.py --dataset=kitti
# To evaluate MOF_sintel.pth, create submission to Sintel bechmark after C+S
python -u evaluate_MOFNet.py --dataset=sintel_submission_stride1
# To evaluate MOF_kitti.pth, create submission to Kitti bechmark after C+S+K
python -u evaluate_MOFNet.py --dataset=kitti_submission

Similarly, to evaluate BOF models:

# Evaluate BOF_things.pth after C stage
python -u evaluate_BOFNet.py --dataset=sintel
python -u evaluate_BOFNet.py --dataset=things
python -u evaluate_BOFNet.py --dataset=kitti
# To evaluate BOF_sintel.pth, create submission to Sintel bechmark after C+S
python -u evaluate_BOFNet.py --dataset=sintel_submission
# To evaluate BOF_kitti.pth, create submission to Kitti bechmark after C+S+K
python -u evaluate_BOFNet.py --dataset=kitti_submission

(Optional & Inference Only) Efficent Implementation

You can optionally use RAFT alternate (efficent) implementation by compiling the provided cuda extension and change the corr_fn flag to be efficient in config files.

cd alt_cuda_corr && python setup.py install && cd ..

Note that this implementation is somewhat slower than all-pairs, but uses significantly less GPU memory during the forward pass. And it does not implement backward function, so do not use it in training.

License

VideoFlow is released under the Apache License

Citation

@article{shi2023videoflow,
  title={Videoflow: Exploiting temporal cues for multi-frame optical flow estimation},
  author={Shi, Xiaoyu and Huang, Zhaoyang and Bian, Weikang and Li, Dasong and Zhang, Manyuan and Cheung, Ka Chun and See, Simon and Qin, Hongwei and Dai, Jifeng and Li, Hongsheng},
  journal={arXiv preprint arXiv:2303.08340},
  year={2023}
}

Acknowledgement

In this project, we use parts of codes in: