Home

Awesome

<h1 align="center"> Revisiting Depth Completion from a Stereo Matching Perspective for Cross-domain Generalization (3DV 2024) </h1> <br>

šŸšØ This repository contains download links to our code, and trained deep stereo models of our work "Revisiting Depth Completion from a Stereo Matching Perspective for Cross-domain Generalization", 3DV 2024

by Luca Bartolomei<sup>1,2</sup>, Matteo Poggi<sup>1,2</sup>, Andrea Conti<sup>2</sup>, Fabio Tosi<sup>2</sup>, and Stefano Mattoccia<sup>1,2</sup>

Advanced Research Center on Electronic System (ARCES)<sup>1</sup> University of Bologna<sup>2</sup>

<div class="alert alert-info"> <h2 align="center">

Project Page | Paper | Supplementary

</h2>

Note: šŸš§ Kindly note that this repository is currently in the development phase. We are actively working to add and refine features and documentation. We apologize for any inconvenience caused by incomplete or missing elements and appreciate your patience as we work towards completion.

We would like to share with you our previous work Active Pattern Without Pattern Projector from which we took inspiration for this work.

:bookmark_tabs: Table of Contents

</div>

:clapper: Introduction

This paper proposes a new framework for depth comple-tion robust against domain-shifting issues. It exploits the generalization capability of modern stereo networks to face depth completion, by processing fictitious stereo pairs obtained through a virtual pattern projection paradigm. Any stereo network or traditional stereo matcher can be seamlessly plugged into our framework, allowing for the deployment of a virtual stereo setup that is future-proof against advancement in the stereo field.

<h4 align="center"> </h4> <img src="./images/framework.png" alt="Alt text" style="width: 800px;" title="architecture">

Contributions:

If you find this code useful in your research, please cite:

@inproceedings{bartolomei2024revisiting,
  title={Revisiting depth completion from a stereo matching perspective for cross-domain generalization},
  author={Bartolomei, Luca and Poggi, Matteo and Conti, Andrea and Tosi, Fabio and Mattoccia, Stefano},
  booktitle={2024 International Conference on 3D Vision (3DV)},
  pages={1360--1370},
  year={2024},
  organization={IEEE}
}

:inbox_tray: Pretrained Models

Here, you can download the weights of RAFT-Stereo architecture.

To use these weights, please follow these steps:

  1. Install GDown python package: pip install gdown
  2. Download all weights from our drive: gdown --folder https://drive.google.com/drive/folders/1AZRHzCn7K7HiPQZocfxWplYHo3WhI8lm?usp=sharing

:memo: Code

The Test section provides scripts to evaluate depth estimation models on datasets like VOID, NYU, DDAD and KITTIDC. It helps assess the accuracy of the models and saves predicted depth maps.

Please refer to each section for detailed instructions on setup and execution.

<div class="alert alert-info">

Warning:

</div>

:hammer_and_wrench: Setup Instructions

Ensure that you have installed all the necessary dependencies. The list of dependencies can be found in the ./requirements.txt file.

You can also follow this script to create a virtual environment and install all the dependencies:

$ conda create -n "vppdc" python
$ conda activate vppdc
$ python -m pip install -r requirements.txt

:floppy_disk: Datasets

We used two datasets for training and evaluation.

NYU Depth V2 (NYUv2)

We used preprocessed NYUv2 HDF5 dataset provided by Andrea Conti.

$ cd PATH_TO_DOWNLOAD
$ wget https://github.com/andreaconti/sparsity-agnostic-depth-completion/releases/download/v0.1.0/nyu_img_gt.h5
$ wget https://github.com/andreaconti/sparsity-agnostic-depth-completion/releases/download/v0.1.0/nyu_pred_with_500.h5

After that, you will get a data structure as follows:

nyudepthv2
ā”œā”€ā”€ nyu_img_gt.h5
ā””ā”€ā”€ nyu_pred_with_500.h5

Note that the original full NYUv2 dataset is available at the official website.

VOID

You can download VOID dataset with different amount of sparse points (i.e., 150, 500, 1500) following this script:

$ cd PATH_TO_DOWNLOAD
$ ./download_void.sh

After that, you will get a data structure as follows:

void
ā”œā”€ā”€ 150
ā”‚    ā”œā”€ā”€ void_150
ā”‚    ā”‚    ā””ā”€ā”€ data
ā”‚    ā”‚          ā”œā”€ā”€ birthplace_of_internet
|    |          ā””ā”€ā”€ ...
|    |          
ā”‚    ā”œā”€ā”€ test_absolute_pose.txt
ā”‚    ā””ā”€ā”€ ...
ā”œā”€ā”€ 500
ā”‚    ā”œā”€ā”€ void_150
ā”‚    ā”‚    ā””ā”€ā”€ data
ā”‚    ā”‚          ā”œā”€ā”€ birthplace_of_internet
|    |          ā””ā”€ā”€ ...
|    |          
ā”‚    ā”œā”€ā”€ test_absolute_pose.txt
ā”‚    ā””ā”€ā”€ ...
...

Note that the script erases zip files. Raw VOID dataset is available at the official website.

KITTIDC

You can download KITTIDC validation split from the official website. You can also directly download it:

$ cd PATH_TO_DOWNLOAD
$ mkdir kitti_dc
$ cd kitti_dc
$ wget https://s3.eu-central-1.amazonaws.com/avg-kitti/data_depth_selection.zip
$ unzip data_depth_selection.zip
$ rm data_depth_selection.zip
$ ln -s depth_selection data_depth_selection

After that, you will get a data structure as follows:

kitti_dc
ā”œā”€ā”€ val_selection_cropped
ā”‚    ā”œā”€ā”€ groundtruth_depth
ā”‚    ā”œā”€ā”€ image
ā”‚    ā”œā”€ā”€ intrinsics
ā”‚    ā””ā”€ā”€ velodyne_raw
ā”œā”€ā”€ test_depth_completion_anonymous
ā””ā”€ā”€ test_depth_prediction_anonymous

DDAD

First of all, please install Dataset Governance Policy (DGP) following the official guide.

Then, you can download and extract the full dataset:

$ cd PATH_TO_DOWNLOAD
$ wget https://tri-ml-public.s3.amazonaws.com/github/DDAD/datasets/DDAD.tar
$ tar -xvf DDAD.tar

After that, you will get a data structure as follows:

ddad_train_val
ā”œā”€ā”€ 000000
ā”œā”€ā”€ 000001
|
...
|
ā”œā”€ā”€ ddad.json
ā””ā”€ā”€ LICENSE.md

Finally, you can use our script to generate the validation data from the front camera:

python convert_ddad.py -i /path/to/ddad_train_val -o /your/output/path/sampled_ddad [--seed]

After that, you will get a data structure as follows:

sampled_ddad
ā””ā”€ā”€ val
     ā”œā”€ā”€ gt
     |    ā”œā”€ā”€  0000000000.png
     |    ...     
     ā”œā”€ā”€ hints 
     |    ā”œā”€ā”€  0000000000.png
     |    ...
     ā”œā”€ā”€ intrinsics
     |    ā”œā”€ā”€  0000000000.txt
     |    ...     
     ā””ā”€ā”€ rgb
          ā”œā”€ā”€  0000000000.png
          ...

:rocket: Test

This code snippet allows you to evaluate the depth maps on various datasets, including KITTIDC, NYU, VOID and DDAD. By executing the provided script, you can assess the accuracy of depth completion models on these datasets.

To run the test.py script with the correct arguments, follow the instructions below:

  1. Run the test:

    • Open a terminal or command prompt.
    • Navigate to the directory containing the test.py script.
  2. Execute the command: Run the following command, replacing the placeholders with the actual values for your images and model:

export CUDA_VISIBLE_DEVICES=0 
python test.py  --datapath <path_to_dataset> --dataset <dataset_type> --model <model_name> \
  --loadmodel <path_to_pretrained_model> --maxdisp 192 --outdir <output_directory> \
  --wsize 5 --guideperc 1 --blending 1 --interpolate --filling --leftpadding --filterlidar  \
  --maskocc --iscale <input_image_scale>

Replace the placeholders (<max_disparity>, <path_to_dataset>, <dataset_type>, etc.) with the actual values for your setup.

The available arguments are:

For more details, please refer to the test.py script.

:art: Qualitative Results

In this section, we present illustrative examples that demonstrate the effectiveness of our proposal.

<br> <p float="left"> <img src="./images/competitors.png" width="800" /> </p>

Synth-to-real generalization. Given an NYU Depth V2 frame and 500 sparse depth points (a), our framework with RAFT-Stereo trained only on the Sceneflow synthetic dataset (e) outperforms the generalization capability of state-of-the-art depth completion networks NLSPN (b), SpAgNet (c), and CompletionFormer (d) ā€“ all trained on the same synthetic dataset.

<br> <p float="left"> <img src="./images/indoor2outdoor.png" width="800" /> </p>

From indoor to outdoor. When it comes to pre-training on SceneFlow and train on indoor data then run the model outdoor, significant domain shift occurs. NLPSN and CompletionFormer seem unable to generalize to outdoor data, while SpAgNet can produce some meaningful depth maps, yet far from being accurate. Finally, VPP4DC can improve the results even further thanks to the pre-training process.

<br> <p float="left"> <img src="./images/outdoor2indoor.png" width="800" /> </p>

From outdoor to indoor. We consider the case complementary to the previous one ā€“ i.e., with models pre-trained on SceneFlow and trained outdoor then tested indoor. NLSPN, CompletionFormer and SpAgNet can predict a depth map that is reasonable to some extent. Our approach instead predicts very accurate results on regions covered by depth hints, yet failing where these are absent.

:envelope: Contacts

For questions, please send an email to luca.bartolomei5@unibo.it

:pray: Acknowledgements

We would like to extend our sincere appreciation to the authors of the following projects for making their code available, which we have utilized in our work:

We deeply appreciate the authors of the competing research papers for provision of code and model weights, which greatly aided accurate comparisons.

<h5 align="center">Patent pending - University of Bologna</h5>