Home

Awesome

Revealing the Dark Secrets of Masked Image Modeling (Depth Estimation) [Paper]

PWC PWC

Main results

Results on NYUv2

Backboned1d2d3abs_relrmsermse_log
Swin-v2-Base0.9350.9910.9980.0440.3040.109
Swin-v2-Large0.9490.9940.9990.0360.2870.102

Results on KITTI

Backboned1d2d3abs_relrmsermse_log
Swin-v2-Base0.9760.9980.9990.0522.0500.078
Swin-v2-Large0.9770.9981.0000.0501.9660.075

Preparation

Please refer to [GLPDepth] for configuring the environment and preparing the NYUV2 and KITTI datasets. You can download pretrained models and our well-trained models from zoo(OneDrive).

Training

Evaluation

Citation

@article{xie2023darkmim,
  title={Revealing the Dark Secrets of Masked Image Modeling},
  author={Zhenda Xie, Zigang Geng, Jingcheng Hu, Zheng Zhang, Han Hu, Yue Cao},
  journal={arXiv preprint arXiv:2205.13543},
  year={2022}
}

Acknowledge

Our code is mainly based on GLPDepth[1]. The code of the model is from SwinTransformer[2] and Simple Baseline[3].

[1] Global-Local Path Networks for Monocular Depth Estimation with Vertical CutDepth. [code]

[2] Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. [code]

[3] Simple Baselines for Human Pose Estimation and Tracking. [code]