Awesome
Fork
This is a fork of "Hierarchical Deep Stereo Matching on High Resolution Images" to support newer Python, PyTorch and TorchVision.
The original implementation only works correctly with torchvision 0.2.0, and it is broken in 0.2.1. It can be found at https://github.com/gengshan-y/high-res-stereo.
This implementation has been tested to work correctly with:
- python 2.7.x, 3.7.x
- PyTorch 0.4.0, 0.4.1, 1.0.1 and 1.1.0
- torchvision 0.2.0, 0.2.1 and 0.3.0
Hierarchical Deep Stereo Matching on High Resolution Images
Architecture: <img src="./architecture.png" width="800">
Qualitative results on Middlebury (refer to project webpage for more results) <img src="http://www.contrib.andrew.cmu.edu/~gengshay/wordpress/wp-content/uploads/2019/06/cvpr19-middlebury1-small.gif" width="400">
Performance on Middlebury benchmark (y-axis: the lower the better) <img src="./middlebury-benchmark.png" width="400">
Weights
Data
train/val
- Middlebury (train set and additional images)
- High-res-virtual-stereo (HR-VS)
- KITTI-2012&2015
- SceneFlow
test
High-res-real-stereo (HR-RS): comming soon
Train
- Download and extract training data in folder /d/. Training data include Middlebury train set, HR-VS, KITTI-12/15 and SceneFlow.
- Run
CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py --maxdisp 384 --batchsize 24 --database /d/ --logname log1 --savemodel /somewhere/ --epochs 10
- Evalute on Middlebury additional images and KITTI validation set. After 10 epochs, average error on Middlebury additional images with half-res should be around 4.6 (excluding Shopvac).
Inference
Example:
CUDA_VISIBLE_DEVICES=3 python submission.py --datapath ./data-mbtest/ --outdir ./mboutput --loadmodel ./weights/final-768px.tar --testres 1 --clean 0.8 --max_disp -1
Evaluation:
CUDA_VISIBLE_DEVICES=3 python submission.py --datapath ./data-HRRS/ --outdir ./output --loadmodel ./weights/final-768px.tar --testres 0.5
python eval_disp.py --indir ./output --gtdir ./data-HRRS/
And use cvkit to visualize in 3D.
Example outputs
<img src="data-mbtest/CrusadeP/im0.png" width="400"> left image <img src="mboutput/CrusadeP/capture_000.png" width="400"> 3D projection <img src="mboutput/CrusadeP-disp.png" width="400"> disparity map <img src="mboutput/CrusadeP-ent.png" width="400"> uncertainty map (brighter->higher uncertainty)Parameters
- testres: 1 is full resolution, and 0.5 is half resolution, and so on
- max_disp: maximum disparity range to search
- clean: threshold of cleaning. clean=0 means removing all the pixels.
Citation
@InProceedings{yang2019hsm,
author = {Yang, Gengshan and Manela, Joshua and Happold, Michael and Ramanan, Deva},
title = {Hierarchical Deep Stereo Matching on High-Resolution Images},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2019}
}
Acknowledgement
Part of the code is borrowed from MiddEval-SDK, PSMNet, FlowNetPytorch and pytorch-semseg.