Home

Awesome

Self-supervised Monocular Depth Estimation with Pytorch

The repository is to build a fair environment where the Self-supervised Monocular Depth Estimation (SMDE) methods could be evaluated and developed.

Welcome to V2.0

In V2.0, you can compute the FLOPs (supported by thop) and infrerence speeds simply. We also supports more flexible traning configs such as dividing one training iteration in multiple steps and setting different loss fuctions for different parameters (e.g. used in TiO-Depth). We have tried our best to update all the methods in V1.0 to V2.0 and we holp it would be helpful. BTW, our new method TiO-Depth was accpted to ICCV 2023 !! and it was incloud in this repo.

About SMDE-Pytorch

We build this repository with Pytorch for evaluating and developing the Self-supervised Monocular Depth Estimation (SMDE) methods. The main targets of the SMDE-Pytorch are:

If you have any questions or suggestions, please make an issue or contact us by zm_zhou1998@163.com (Maybe I couldn't reply soon due to work.). If you like the work and click the Star, we will be happy~

Setup

We built and tested the repository with Ubuntu 18.04, CUDA 11.0, Python 3.7.9, and Pytorch 1.7.0. For using this repository, we recommend creating a virtual environment by Anaconda. Please open a terminal in the root of the repository folder for running the following commands and scripts.

conda env create -f environment.yml
conda activate pytorch170cu11

Method Zoo

MethodRef.TestTrainPaperCode
Monodepth22019 ICCVLinkLink
DepthHints2019 ICCVLinkLink
EdgeOfDepth2020 CVPRLinkLink
PackNet2020 CVPRLinkLink
P2Net2020 ECCVLinkLink
FAL-Net2020 NeurIPSLinkLink
HRDepth2021 AAAILinkLink
DIFFNet2021 BMCVLinkLink
ManyDepth2021 CVPRLinkLink
EPCDepth2021 ICCVLinkLink
FSRE-Depth2021 ICCVLinkLink
R-MSFM2021 ICCVLinkLink
OCFD-Net (Ours)2022 ACM-MM'LinkLink
SDFA-Net (Ours)2022 ECCVLinkLink
TiO-Depth (Ours)2023 ICCVLinkLink

TODO List

Evaluation Results

We give the performances of the methods on the KITTI raw test set (an outdoor dataset) for helping you choose the model. More pretrained models are given on their pages (click their names in the above table).

MethodInfo.SupTrainedAbs Rel.Sq Rel.RMSERMSElogA1
ManyDepth(Mono)Res18+192x640MonoOffical0.1180.8914.7630.1920.871
PackNetPackV1+192x640MonoOfficial0.1100.8364.6550.1870.881
R-MSFM6Res18+192x640MonoTrained0.1100.7974.6460.1880.880
Monodepth2Res18+320x1024MonoTrained0.1090.7974.5330.1840.888
FSRE-DepthRes18+192x640MonoTrained0.1070.7514.5250.1820.886
Monodepth2Res18+320x1024StereoTrained0.1040.8244.7470.2000.875
HRDepthRes18+384x1280MonoTrained0.1020.7194.3960.1780.897
FAL-NetBN=49+375x1242StereoTrained0.0990.6254.1970.1820.885
DIFFNetHR18+320x1024MonoTrained0.0990.6884.3450.1760.901
DepthHintsRes50+320x1024StereoTrained0.0940.6804.3330.1810.894
EdgeOfDepthRes50+320x1024StereoOfficial0.0920.6474.2470.1770.897
OCFD-NetRes50+384x1280StereoTrained0.0910.5764.0360.1740.901
EPCDepthRes50+320x1024StereoTrained0.0900.6824.2820.1780.903
SDFA-NetSwinT*+384x1280StereoTrained0.0890.5373.8950.1690.906
TiO-DepthSwinT*+384x1280StereoTrained0.0850.5443.9190.1690.911

The methods on the NYU v2 test set (an indoor dataset).

MethodInfo.SupTrainedAbs Rel.RMSElog10A1
P2NetRes18+5f+288x384MonoOfficial0.1490.5560.0630.797

Predict depth for your image(s) straightforwardly

To predict depth maps for your images, please firstly download the pretrained model that you are interested in from the column named Trained in the above table. After unzipping the downloaded model, you could predict the depth maps for your images by

python predict.py\
 --image_path <path to your image or folder name for your images>\
 --exp_opts <path to the method training option>\
 --model_path <path to the downloaded or trained model>

You also could set --input_size to decide the size that the images are reshaped before they are input to the model. If you want to predict on CPU, please set --cpu. The depth results <image name>_pred.npy and the visualization results <image name>_visual.png will be saved in the same folder as the input images.

For example, if you want to predict depths from the images in ./example_images with Monodepth2 (using the model that was saved in pretrained_models/MD2_S_320_bs4/model/best_model.pth), you could use:

python predict.py\
 --image_path example_images\
 --exp_opts options/Monodepth2/train/monodepth2-res18_320_kitti_stereo.yaml\
 --model_path pretrained_models/MD2_M_320_bs4/model/best_model.pth

For the methods which could not be trained in the repository yet, you could use the options in options/_base/network for --exp_opts. Specifically, you could use the following command for predicting the images with PackNet and the pretrained model saved in pretrained_models/PackNet_M_192_OI/model/PackNet_M_192.pth.

python predict.py\
 --image_path example_images\
 --exp_opts options/_base/networks/packnet.yaml\
 --model_path pretrained_models/PackNet_M_192_OI/model/PackNet_M_192.pth\

Since the default image size in options/_base/networks/packnet.yaml is 192x640, when you want to use the model trained under 384x1280, you could use:

python predict.py\
 --image_path example_images\
 --exp_opts options/_base/networks/packnet.yaml\
 --model_path pretrained_models/PackNet_Mv_CS+K_384_OI/model/PackNet_Mv_CS+K_384.pth\
 --input_size 384 1280

Prepare datasets

Before evaluating or training the methods, you should download the used datasets. The datasets that could be used for training or evaluating:

DatasetTrainTest
KITTI✔ (175GB)✔ (2GB)
NYU v2✔ (2GB)
Mak3D✔ (200MB)
Cityscapes✔ (130GB)✔ (35GB)
KITTI Stereo 2015✔ (2GB)
Set data path

We give an example path_example.py for setting the path in the repository. Please create a python file named path_my.py and copy the code in path_example.py to the path_my.py. Then you can replace the used paths to your folder in the path_my.py. the folder for each dataset should be organized like:

<root of kitti>
|---2011_09_26
|   |---2011_09_26_drive_0001_sync
|   |   |---image_02
|   |   |---image_03
|   |   |---velodyne_points
|   |   |---...
|   |---2011_09_26_drive_0002_sync
|   |   |---image_02
|   |   |---image_03
|   |   |---velodyne_points
|   |   |---...
|   '''
|---2011_09_28
|   |--- ...
|---gt_depths_raw.npz (for raw Eigen test set)
|---gt_depths_improved.npz (for improved Eigen test set)
<root of NYU v2 (just test set)>
|---00001.h5
|---00002.h5
|---00003.h5
|---...
<root of Make3D>
|---Gridlaserdata
|   |---depth_sph_corr-10.21op2-p-015t000.mat
|   |---depth_sph_corr-10.21op2-p-139t000.mat
|   |---...
|---Test134
|   |---img-10.21op2-p-015t000.jpg
|   |---img-10.21op2-p-139t000.jpg
|   |---...
<root of cityscapes>
|---leftImg8bit
|   |---train
|   |   |---aachen
|   |   |   |---aachen_000000_000019_leftImg8bit.png
|   |   |   |---aachen_000001_000019_leftImg8bit.png
|   |   |   |---...
|   |   |---bochum
|   |   |---...
|   |---train_extra
|   |   |---augsburg
|   |   |---...
|   |---test
|   |   |---...
|   |---val
|   |   |---...
|---rightImg8bit
|   |--- ...
|---camera
|   |--- ...
|---disparity
|   |--- ...
|---gt_depths (for evaluation)
|   |---000_depth.npy
|   |---001_depth.npy
|   |--- ...
<root of kitti 2015>
|---training
|   |---image_2
|   |   |---000000_10.png
|   |   |---000000_11.png
|   |   |---000001_10.png
|   |   |---...
|   |---image_3
|   |   |---000000_10.png
|   |   |---000000_11.png
|   |   |---000001_10.png
|   |   |---...
|   |---disp_occ_0
|   |   |---000000_10.png
|   |   |---000000_11.png
|   |   |---000001_10.png
|   '''
|---testing
|   |--- ...
KITTI

For training the methods on the KITTI dataset (the Eigen split), you should download the entire KITTI dataset (about 175GB) by:

wget -i ./datasets/kitti_archives_to_download.txt -P <save path>

And you could unzip them with:

cd <save path>
unzip "*.zip"

For evaluating the methods on the KITTI (Eigen raw test set), you should further generate the ground-truth depth file by (as done in the Monodepth2):

python datasets/utils/export_kitti_gt_depth.py --data_path <root of KITTI> --split raw

If you want to evaluate the method on the KITTI improved test set, you should download the annotated depth maps (about 15GB) at Here and unzip it. Then you could generate the imporved ground-truth depth file by:

python datasets/utils/export_kitti_gt_depth.py --data_path <root of KITTI> --split improved

As an alternative, we provide the Eigen test subset (with .png images Here or with .jpg images Here, about 2GB) and the generated gt_depth files for the people who just want to do the evaluation.

NYUv2

We use the NYUv2 test set as done in P2Net and EPCDepth, which could be downloaded in Here

Make3D

We use the Make3D test set for evaluating some methods, which could be downloaded in Here

Cityscapes

Cityscapes could be used to jointly train the model with KITTI, which is helpful to improve the performance of the model. If you want to use the Cityscapes, please download the following parts of the dataset at Here and unzip them to your <root of cityscapes> (Note: For some files, you should apply for download permission by email.):

leftImg8bit_trainvaltest.zip (11GB)  <- If just do the evluation, download this
leftImg8bit_trainextra.zip (44GB)
rightImg8bit_trainvaltest.zip (11GB)
rightImg8bit_trainextra.zip (44GB)
disparity_trainvaltest.zip (3.5GB)
disparity_trainextra.zip (15GB)
camera_trainvaltest.zip (2MB)  <- If just do the evluation, download this
camera_trainextra.zip (8MB)

Then, please generate the camera parameter matrices by:

python datasets/utils/export_cityscapes_matrix.py

You also need to download the prepared ground-truth depth Here which is provided by Watson in ManyDepth.

KITTI Stereo 2015

For evaluating the model on the KITTI Stereo 2015 training set as many stereo matching methods, you should download the corresponding dataset Here and unzip it. It is noted that the training of the model requires the entire KITTI dataset.

Evaluate the methods

To evaluate the methods on the prepared dataset, you could simply use

python evaluate.py\
 --exp_opts <path to the method EVALUATION option>\
 --model_path <path to the downloaded or trained model>

We provide the EVALUATION option files in options/<Method Name>/eval/*. Here we introduce some important arguments.

ArgumentInformation
--metric_name depth_kitti_monoEnable the median scaling for the methods traind with monocular sequences (Sup = Mono)
--visual_listThe samples which you want to save the output (path to a .txt file)
--save_predSave the predicted depths of the samples which are in --visual_list
--save_visualSave the visualization results of the samples which are in --visual_list
-fpp,-gpp, -msppAdopt different post-processing steps. (Please choose one in each time)

The output files are saved in eval_res\ by default. Please check evaluate.py for more information about arguments.

For example, if you want to evaluate Monodepth2 on the KITTI Eigen test set with the post-processing proposed by Godard, and you want to save the visualization and predicted depths of all the test samples. Please use:

python evaluate.py\
 --exp_opts options/Monodepth2/eval/monodepth2-res18-stereo_320_kitti.yaml\
 --model_path pretrained_models/MD2_S_320_bs4/model/best_model.pth\
 -gpp\
 --save_visual\
 --save_pred\
 --visual_list data_splits/kitti/test_list.txt

The evaluation output will be like

->Load the test dataset
->Load the pretrained model
->Use the post processing
->Start Evaluation
697/697
    | abs_rel  |  sq_rel  |   rms    | log_rms  |    a1    |    a2    |    a3    |
    |     0.102|     0.795|     4.685|     0.198|     0.876|     0.954|     0.977|

The output predicted depths and visualization results will be saved in eval_res/MD2_S_320_bs4/-gpp/*.

Train the methods

To train (reproduce) the methods on the prepared dataset, you could simply use the commands provided in options/<Method Name>/train/train_scripts.sh.

For example, if you want to train Monodepth2 on the KITTI dataset with stereo image pairs, please use:

python\
 train_dist.py\
 --name MD2-Res50_192_B12_S\
 --exp_opts options/Monodepth2/train/monodepth2-res18_192_kitti_stereo.yaml\
 --batch_size 12\
 --beta1 0.9\
 --epoch 20\
 --decay_step 15\
 --decay_rate 0.1\
 --save_freq 10\
 --visual_freq 2000

Modify the methods

coming soon

References

Mmsegmentation
Mmcv
Mmengine
PaddleSeg
Monodepth2
FAL-Net
DepthHints
DIFFNet
EPCDepth
EdgeOfDepth
PackNet
P2Net
HRDepth
FSRE-Depth
ManyDepth
R-MSFM ApolloScape Dataset
KITTI Dataset
NYUv2 Dataset
Make3D Dataset
Cityscapes Dataset