Home

Awesome

Hierarchical Deep Stereo Matching on High Resolution Images

[project webpage]

<img src="./architecture.png" width="800">

Qualitative results on Middlebury:

<img src="http://www.contrib.andrew.cmu.edu/~gengshay/wordpress/wp-content/uploads/2019/06/cvpr19-middlebury1-small.gif" width="400">

Performance on Middlebury benchmark (y-axis: error, the lower the better):

<img src="./middlebury-benchmark.png" width="400">

Able to handle large view variation of high-res images (as a submodule in Open4D, CVPR 2020):

<img src="http://www.contrib.andrew.cmu.edu/~gengshay/wordpress/wp-content/uploads/2020/02/cvpr19-dance.gif" width="800">

Requirements

Weights

Note: The .tar file can be directly loaded in pytorch. No need to uncompress it.

Inference

Test on CrusadeP and dancing stereo pairs:

CUDA_VISIBLE_DEVICES=3 python submission.py --datapath ./data-mbtest/   --outdir ./mboutput --loadmodel ./weights/final-768px.tar  --testres 1 --clean 1.0 --max_disp -1

Evaluate on Middlebury additional images:

CUDA_VISIBLE_DEVICES=3 python submission.py --datapath ./path_to_additional_images   --outdir ./output --loadmodel ./weights/final-768px.tar  --testres 0.5
python eval_mb.py --indir ./output --gtdir ./groundtruth_path

Evaluate on HRRS:

CUDA_VISIBLE_DEVICES=3 python submission.py --datapath ./data-HRRS/   --outdir ./output --loadmodel ./weights/final-768px.tar  --testres 0.5
python eval_disp.py --indir ./output --gtdir ./data-HRRS/

And use cvkit to visualize in 3D.

Example outputs

<img src="data-mbtest/CrusadeP/im0.png" width="400"> left image <img src="mboutput/CrusadeP/capture_000.png" width="400"> 3D projection <img src="mboutput/CrusadeP-disp.png" width="400"> disparity map <img src="mboutput/CrusadeP-ent.png" width="400"> uncertainty map (brighter->higher uncertainty)

Parameters

Data

train/val

test

High-res-real-stereo (HR-RS) It has been taken off due to licensing issue. Please use the Argoverse dataset.

Train

  1. Download and extract training data in folder /d/. Training data include Middlebury train set, HR-VS, KITTI-12/15, ETH3D, and SceneFlow.
  2. Run
CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py --maxdisp 384 --batchsize 28 --database /d/ --logname log1 --savemodel /somewhere/  --epochs 10
  1. Evalute on Middlebury additional images and KITTI validation set. After 40k iterations, average error on Middlebury additional images excluding Shopvac (perfect+imperfect, 24 stereo pairs in total) with half-res should be around 5.7.

Citation

@InProceedings{yang2019hsm,
author = {Yang, Gengshan and Manela, Joshua and Happold, Michael and Ramanan, Deva},
title = {Hierarchical Deep Stereo Matching on High-Resolution Images},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2019}
}

Acknowledgement

Part of the code is borrowed from MiddEval-SDK, PSMNet, FlowNetPytorch and pytorch-semseg. Thanks SorcererX for fixing version compatibility issues.