Awesome

NMRF-Stereo

Official PyTorch implementation of paper:

Neural Markov Random Field for Stereo Matching, CVPR 2024<br/> Tongfan Guan, Chen Wang, Yun-Hui Liu<br/>

:new: Updates

[2024/07/18]: :rocket: NMRF-Stereo-SwinT ranks first on KITTI 2012 and KITTI 2015-NOC, with the ImageNet pretrained Swin-T as backbone.

Introduction

The stereo method of hand-crafted Markov Random Field (MRF) lacks sufficient modeling accuracy compared to end-to-end deep models. While deep learning representations have greatly improved the unary terms of MRF models, the overall accuracy is still severely limited by the hand-crafted pairwise terms and message passing. To address these issues, we propose a neural MRF model, where both potential functions and message passing are designed using data-driven neural networks. Our fully data-driven model is built on the foundation of variational inference theory, to prevent convergence issues and retain stereo MRF's graph inductive bias. To make the inference tractable and scale well to high-resolution images, we also propose a Disparity Proposal Network (DPN) to adaptively prune the search space for every pixel.

overview

Highlights

High accuracy & efficiency

NMRF-Stereo reports state-of-the-art accuracy on Scene Flow and ranks first on KITTI 2012 and KITTI 2015 leaderboards among all published methods at the time of submission. The model runs at 90ms (RTX 3090) for KITTI data (1242x375).
Strong cross-domain generalization

NMRF-Stereo exhibits great generalization abilities on other dataset/scenes. The model is trained only with synthetic Scene Flow data:
Sharp depth boundaries

NMRF-Stereo is able to recover sharp depth boundaries, which is key to downstream applications, such as 3D reconstruction and object detection.

Installation

Our code is developed on Ubuntu 20.04 using Python 3.8 and PyTorch 1.13. Please note that the code has only been tested with these specified versions. We recommend using conda for the installation of dependencies:

Create the NMRF conda environment and install all dependencies:

conda env create -f environment.yml
conda activate NMRF

Build deformable attention and superpixel-guided disparity downsample operator:

cd ops && sh make.sh && cd ..

Dataset Preparation

To train/evaluate NMRF-Stereo, you will need to download the required datasets.

Scene Flow (Includes FlyingThings3D, Driving & Monkaa)
Middlebury
ETH3D
KITTI 2012
KITTI 2015

By default datasets.py will search for the datasets in these locations. You can create symbolic links to wherever the datasets were downloaded in the $root/datasets folder:

ln -s $YOUR_DATASET_ROOT datasets

Our folder structure is as follows:

├── datasets
    ├── ETH3D
    │   ├── two_view_training
    │   └── two_view_training_gt
    ├── KITTI
    │   ├── KITTI_2012
    │   │   ├── testing
    │   │   └── training
    │   └── KITTI_2015
    │       ├── testing
    │       └── training
    ├── Middlebury
    │   ├── 2014
    │   └── MiddEval3
    └── SceneFlow
        ├── Driving
        │   ├── disparity
        │   └── frames_finalpass
        ├── FlyingThings3D
        │   ├── disparity
        │   └── frames_finalpass
        └── Monkaa
            ├── disparity
            └── frames_finalpass

(Optional) Occlusion mask

We provide a script to generate occlusion mask for Scene Flow dataset. This may bring marginal performance improvement.

python tools/generate_occlusion_map.py

Demos

Pretrained models can be downloaded from google drive

We assume the downloaded weights are located under the pretrained directory.

You can demo a trained model on pairs of images. To predict stereo for ETH3D, run

python inference.py --dataset-name eth3d --output $output_directory SOLVER.RESUME pretrained/sceneflow.pth

Or test on your own stereo pairs

python inference.py --input $left_directory/*.png $right_directory/*.png --output $output_directory SOLVER.RESUME pretrained/$pretrained_model.pth

Evaluation

To evaluate on SceneFlow test set, run

python main.py --num-gpus 4 --eval-only SOLVER.RESUME pretrained/sceneflow.pth

Or for cross-domain generalization:

python main.py --num-gpus 4 --eval-only --config-file configs/zero_shot_evaluation.yaml SOLVER.RESUME pretrained/sceneflow.pth

For submission to KITTI 2012 and 2015 online test sets, you can run:

python inference.py --dataset-name kitti_2015 SOLVER.RESUME pretrained/kitti.pth

and

python inference.py --dataset-name kitti_2012 SOLVER.RESUME pretrained/kitti.pth

Training

To train on SceneFlow, run

python main.py --checkpoint-dir checkpoints/sceneflow --num-gpus 4

To train on KITTI, run

python main.py --checkpoint-dir checkpoints/kitti --config-file configs/kitti_mix_train.yaml --num-gpus 4 SOLVER.RESUME pretrained/sceneflow.pth

We support using tensorboard to monitor and visualize the training process. You can first start a tensorboard session with

tensorboard --logdir checkpoints

and then access http://localhost:6006 in your browser.

Citation

If you find our work useful in your research, please consider citing our paper:

@inproceedings{guan2024neural,
  title={Neural Markov Random Field for Stereo Matching},
  author={Guan, Tongfan and Wang, Chen and Liu, Yun-Hui},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={5459--5469},
  year={2024}
}

Acknowledgements

This project would not have been possible without relying on some awesome repos: RAFT-Stereo, Detectron2, and Swin.