Home

Awesome

Self-Supervised Monocular Depth Hints

Jamie Watson, Michael Firman, Gabriel J. Brostow and Daniyar Turmukhambetov – ICCV 2019

[Link to paper]

<p align="center"> <img src="assets/kitti.gif" alt="example input output gif" width="700" /> </p>

We introduce Depth Hints, which improve monocular depth estimation algorithms trained from stereo pairs.

We find that photometric reprojection losses used with self-supervised learning typically have multiple local minima.
This can restrict what a regression network learns, for example causing artifacts around thin structures.

Depth Hints are complementary depth suggestions obtained from simple off-the-shelf stereo algorithms, e.g. Semi-Global Matching. These hints are used during training to guide the network to learn better weights. They require no additional data, and are assumed to be right only sometimes.

Combined with other good practices, Depth Hints gives state-of-the-art depth predictions on the KITTI benchmark (see images above and results table below). We show additional monocular depth estimation results on the sceneflow dataset:

<p align="center"> <img src="assets/sceneflow.gif" alt="example input output gif" width="700" /> </p>

✏️ πŸ“„ Citation

If you find our work useful or interesting, please consider citing our paper:

@inproceedings{watson-2019-depth-hints,
  title     = {Self-Supervised Monocular Depth Hints},
  author    = {Jamie Watson and
               Michael Firman and
               Gabriel J. Brostow and
               Daniyar Turmukhambetov},
  booktitle = {The International Conference on Computer Vision (ICCV)},
  month = {October},
  year = {2019}
}

πŸ“ˆ KITTI Results

Model nameTraining modalityImageNet pretrainedResolutionAbs relSq rel𝛿 < 1.25
Ours Resnet50StereoYes640 x 1920.1020.7620.880
Ours Resnet50 no ptStereoNo640 x 1920.1180.9410.850
Ours HR Resnet50StereoYes1024 x 3200.0960.7100.890
Ours HR Resnet50 no ptStereoNo1024 x 3200.1120.8570.861
Ours HRMono + StereoYes1024 x 3200.0980.7020.887

Please see the paper for full results. To download the weights and predictions for each model please follow the links below:

Model nameTraining modalityImageNet pretrainedResolutionWeightsEigen Predictions
Ours Resnet50StereoYes640 x 192DownloadDownload
Ours Resnet50 no ptStereoNo640 x 192DownloadDownload
Ours HR Resnet50StereoYes1024 x 320DownloadDownload
Ours HR Resnet50 no ptStereoNo1024 x 320DownloadDownload
Ours HRMono + StereoYes1024 x 320DownloadDownload

βš™οΈ Code

The code for Depth Hints builds upon monodepth2. If you have questions about running the code, please see the issues in that repository first.

To train using depth hints:

πŸŽ‰ And that's it! πŸŽ‰

πŸ‘€ Reproducing Paper Results

To recreate the results from our paper, run:

python train.py
  --data_path <your_KITTI_path>
  --log_dir <your_save_path>
  --model_name stereo_depth_hints
  --use_depth_hints
  --depth_hint_path <your_depth_hint_path>
  --frame_ids 0  --use_stereo
  --scheduler_step_size 5
  --split eigen_full
  --disparity_smoothness 0

Additionally:

The results above and in the main paper arise from evaluating on the KITTI sparse LiDAR point cloud, using the Eigen Test split.

To test on KITTI, run:

python evaluate_depth.py
  --data_path <your_KITTI_path>
  --load_weights_folder <your_model_path>
  --use_stereo

Make sure you have run export_gt_depth.py to extract ground truth files.

Additionally, if you see ValueError: Object arrays cannot be loaded when allow_pickle=False, then either downgrade numpy, or change line 166 in evaluate_depth.py to

gt_depths = np.load(gt_path, fix_imports=True, encoding='latin1', allow_pickle=True)["data"]

πŸ–Ό Running on your own images

To run on your own images, run:

python test_simple.py
  --image_path <your_image_path>
  --model_path <your_model_path>
  --num_layers <18 or 50>

This will save a numpy array of depths, and a colormapped depth image.

πŸ‘©β€βš–οΈ License

Copyright Β© Niantic, Inc. 2020. Patent Pending. All rights reserved. Please see the license file for terms.