Home

Awesome

MaskingDepth (IROS 2024)

[Project Page] [Paper]

This code is the implementation of the paper <a href="https://arxiv.org/abs/2212.10806">MaskingDepth: Masked Consistency Regularization for Semi-supervised Monocular Depth Estimation</a> by Baek et al.

image

We propose MaskingDepth, a novel semi-supervised learning framework for monocular depth estimation to mitigate the reliance on large ground-truth depth quantities. MaskingDepth is designed to enforce consistency between the strongly-augmented unlabeled data and the pseudo-labels derived from weakly-augmented unlabeled data, which enables learning depth without supervision. In this framework, a novel data augmentation is proposed to take the advantage of a naive masking strategy as an augmentation, while avoiding its scale ambiguity problem between depths from weakly- and strongly-augmented branches and risk of missing small-scale instances. To only retain high-confident depth predictions from the weakly-augmented branch as pseudo-labels, we also present an uncertainty estimation technique, which is used to define robust consistency regularization. Experiments on KITTI and NYU-Depth-v2 datasets demonstrate the effectiveness of each component, its robustness to the use of fewer depth-annotated images, and superior performance compared to other state-of-the-art semi-supervised methods for monocular depth estimation.

Environment

In docker container

git clone https://github.com/KU-CVLAB/MaskingDepth.git   # Download this project
cd MaskingDepth                                          # Change directory
sh package_install.sh                                    # Install additionally package 

we recommend that vit encoder initialize through this weight file. Download ViT ImageNet pretrained weight

Dataset

Training

Edit conf/base_train.yaml file. See the comments in the configuration file for detail options.

python train.py

Edit conf/consistency_train.yaml file. See the comments in the configuration file for detail options.

python consistency_train.py

Evaluation

We evaluate through the eval_with_pngs.py created by BTS. For evaluation we divide test set according to Eigen split.

Results

Quantitative results on the KITTI dataset in a sparsely-supervised setting

image

full

MethodsAbsRel ↓SqRel ↓RMSE ↓RMSElog ↓δ↑
Baseline0.076 ± 0.0030.365 ± 0.0043.290 ± 0.0150.118 ± 0.0010.934 ± 0.001
Baseline+Self0.076 ± 0.0020.367 ± 0.0073.291 ± 0.0200.117 ± 0.0010.933 ± 0.002
Ours+Self0.079 ± 0.0010.379 ± 0.0073.388 ± 0.0190.121 ± 0.0090.929 ± 0.001
Ours0.074 ± 0.0010.362 ± 0.0013.253 ± 0.0120.116 ± 0.0010.935 ± 0.001

10,000

MethodsAbsRel ↓SqRel ↓RMSE ↓RMSElog ↓δ↑
Baseline0.079 ± 0.0010.379 ± 0.0073.388 ± 0.0190.121 ± 0.0090.929 ± 0.001
Baseline+Self0.078 ± 0.0010.376 ± 0.0063.347 ± 0.0430.119 ± 0.0020.931 ± 0.001
Ours+Self0.076 ± 0.0170.369 ± 0.0043.311 ± 0.0110.117 ± 0.0010.935 ± 0.002
Ours0.075 ± 0.0020.362 ± 0.0063.259 ± 0.0200.116 ± 0.0010.934 ± 0.003

1,000

MethodsAbsRel ↓SqRel ↓RMSE ↓RMSElog ↓δ↑
Baseline0.098 ± 0.0040.515 ± 0.0303.785 ± 0.0130.142 ± 0.0050.899 ± 0.005
Baseline+Self0.096 ± 0.0020.523 ± 0.0243.750 ± 0.0330.140 ± 0.0020.900 ± 0.004
Ours+Self0.085 ± 0.0170.430 ± 0.0113.521 ± 0.0120.129 ± 0.0120.918 ± 0.010
Ours0.088 ± 0.0030.419 ± 0.0073.490 ± 0.0200.129 ± 0.0030.917 ± 0.002

100

MethodsAbsRel ↓SqRel ↓RMSE ↓RMSElog ↓δ↑
Baseline0.135 ± 0.0050.728 ± 0.0194.585 ± 0.0480.186 ± 0.0110.831 ± 0.005
Baseline+Self0.132 ± 0.0040.759 ± 0.0144.559 ± 0.0440.184 ± 0.0030.834 ± 0.004
Ours+Self0.123 ± 0.0030.747 ± 0.0184.497 ± 0.0420.181 ± 0.0050.839 ± 0.005
Ours0.128 ± 0.0040.707 ± 0.0134.295 ± 0.0370.173 ± 0.0060.849 ± 0.006

10

MethodsAbsRel ↓SqRel ↓RMSE ↓RMSElog ↓δ↑
Baseline0.201 ± 0.0231.508 ± 0.0456.163 ± 0.0820.268 ± 0.0290.701 ± 0.021
Baseline+Self0.210 ± 0.0201.322 ± 0.0425.627 ± 0.0800.265 ± 0.0270.711 ± 0.016
Ours+Self0.184 ± 0.0111.265 ± 0.0645.747 ± 0.0800.243 ± 0.0070.727 ± 0.018
Ours0.197 ± 0.0191.378 ± 0.0325.650 ± 0.0910.261 ± 0.0300.723 ± 0.017

Quantitative results on the NYU-Depth-v2 dataset in a sparsely-supervised setting

full

MethodsAbsRel ↓RMSE ↓log10 ↓δ↑
Baseline0.106 ± 0.0020.380 ± 0.0040.053 ± 0.0010.897 ± 0.001
Ours0.105 ± 0.0020.379 ± 0.0030.053 ± 0.0010.899 ± 0.001

10,000

MethodsAbsRel ↓RMSE ↓log10 ↓δ↑
Baseline0.112 ± 0.0040.389 ± 0.0060.057 ± 0.0030.893 ± 0.003
Ours0.107 ± 0.0020.386 ± 0.0060.054 ± 0.0020.896 ± 0.002

1,000

MethodsAbsRel ↓RMSE ↓log10 ↓δ↑
Baseline0.141 ± 0.0080.447 ± 0.0090.066 ± 0.0040.843 ± 0.006
Ours0.135 ± 0.0070.440 ± 0.0080.065 ± 0.0040.853 ± 0.005

100

MethodsAbsRel ↓RMSE ↓log10 ↓δ↑
Baseline0.199 ± 0.0110.604 ± 0.0140.086 ± 0.0050.694 ± 0.011
Ours0.182 ± 0.0080.594 ± 0.0120.083 ± 0.0030.718 ± 0.010

10

MethodsAbsRel ↓RMSE ↓log10 ↓δ↑
Baseline0.321 ± 0.0400.872 ± 0.0420.124 ± 0.0080.523 ± 0.027
Ours0.292 ± 0.0310.814 ± 0.0370.112 ± 0.0060.561 ± 0.021

Qualitative results on the KITTI dataset. (a) RGB image, predicted depth maps by (b), (d) baseline, and (c), (e) ours using 100 and 10,000 labeled frames, respectively. image

Qualitative results on the NYU-Depth-v2 dataset. (a) RGB image, (b) ground-truth depth map, and predicted depth maps by (c), (e) baseline, and (d), (f) ours using 100 and 10,000 labeled frames, respectively. image

Citation

Please consider citing our paper if you use this code.

@article{baek2022semi,
  title={Semi-Supervised Learning of Monocular Depth Estimation via Consistency Regularization with K-way Disjoint Masking},
  author={Baek, Jongbeom and Kim, Gyeongnyeon and Park, Seonghoon and An, Honggyu and Poggi, Matteo and Kim, Seungryong},
  journal={arXiv preprint arXiv:2212.10806},
  year={2022}
}