Home

Awesome

Learning Unsupervised Multi-View Stereopsis via Robust Photometric Consistency

This repository contains the official implementation of Learning Unsupervised Multi-View Stereopsis via Robust Photometric Consistency presented as an oral at the 3D Scene Understanding for Vision, Graphics, and Robotics workshop, CVPR 2019.

Project Page | Slides | Poster

teaser Our model consumes a collection of calibrated images of a scene from multiple views and produces depth maps for every such view. We show that this depth prediction model can be trained in an unsupervised manner using our robust photo consistency loss. The predicted depth maps are then fused together into a consistent 3D reconstruction which closely resembles and often improves upon the sensor scanned model. Left to Right: Input images, predicted depth maps, our fused 3D reconstruction, ground truth 3D scan.

Abstract

We present a learning based approach for multi-view stereopsis (MVS). While current deep MVS methods achieve impressive results, they crucially rely on ground-truth 3D training data, and acquisition of such precise 3D geometry for supervision is a major hurdle. Our framework instead leverages photometric consistency between multiple views as supervisory signal for learning depth prediction in a wide baseline MVS setup. However, naively applying photo consistency constraints is undesirable due to occlusion and lighting changes across views. To overcome this, we propose a robust loss formulation that: a) enforces first order consistency and b) for each point, selectively enforces consistency with some views, thus implicitly handling occlusions. We demonstrate our ability to learn MVS without 3D supervision using a real dataset, and show that each component of our proposed robust loss results in a significant improvement. We qualitatively observe that our reconstructions are often more complete than the acquired ground truth, further showing the merits of this approach. Lastly, our learned model generalizes to novel settings, and our approach allows adaptation of existing CNNs to datasets without ground-truth 3D by unsupervised finetuning.

Summary

Intuition

For a set of images of a scene, a given point in a source image may not be visible across all other views.

loss

Implementation

The predicted depth map from the network, along with the reference image are used to warp and calculate a loss map for each of M non-reference neighboring views. These M loss maps are then concatenated into a volume of dimension H × W × M, where H and W are the image dimensions. This volume is used to perform a pixel-wise selection to pick the K “best” (lowest loss) values, along the 3rd dimension of the volume (i.e. over the M loss maps), using which we take the mean to compute our robust photometric loss.

loss

Results

Pre-trained model weights and outputs (depth maps, point clouds, 3D ply files) for the DTU dataset can be downloaded from Google Drive.

results

Installation

To check out the source code:

git clone https://github.com/tejaskhot/unsup_mvs/

Install CUDA 9.0, CUDNN 7.0 and Python 2.7. Please note that this code has not been tested with other versions and may likely need changes for running with different versions of libraries. This code is also not optimized for multi-GPU execution.

Recommended conda environment:

conda create -n mvs python=2.7 pip
conda activate mvs
pip install -r requirements.txt

Parts of the code for this project are borrowed and modified from the excellent MVSNet repository. Please follow instructions detailed there for downloading and processing data, and for installing fusibile which is used for fusion of the depth maps into a point cloud.

Training

python train_dtu.py --dtu_data_root <YOUR_PATH> --log_dir <YOUR_PATH> --save_dir <YOUR_PATH> --save_op_dir <YOUR_PATH>

Note:

Testing

python test_dtu.py --dense_folder <YOUR_PATH> --output_folder <YOUR_PATH> --model_dir <YOUR_PATH> --ckpt_step 45000

Note:

python depthfusion_dtu.py --model_folder <YOUR_PATH> --image_folder <YOUR_PATH> --cam_folder <YOUR_PATH> --fusibile_exe_path <YOUR_PATH>

Note:

Citation

If you find this work useful, please cite the paper.

@article{khot2019learning,
  title={Learning Unsupervised Multi-View Stereopsis via Robust Photometric Consistency},
  author={Khot, Tejas and Agrawal, Shubham and Tulsiani, Shubham and Mertz, Christoph and Lucey, Simon and Hebert, Martial},
  journal={arXiv preprint arXiv:1905.02706},
  year={2019}
}

Please consider also citing MVSNet if you found the code useful.