Home

Awesome

<h1 align="center"> Active Stereo Without Pattern Projector (ICCV 2023) </h1> <h1 align="center"> Stereo-Depth Fusion through Virtual Pattern Projection (Journal Extension) </h1> <br>

:rotating_light: This repository contains download links to our code, and trained deep stereo models of our works "Active Stereo Without Pattern Projector", ICCV 2023 and "Stereo-Depth Fusion through Virtual Pattern Projection", Journal Extension

by Luca Bartolomei<sup>1,2</sup>, Matteo Poggi<sup>2</sup>, Fabio Tosi<sup>2</sup>, Andrea Conti<sup>2</sup>, and Stefano Mattoccia<sup>1,2</sup>

Advanced Research Center on Electronic System (ARCES)<sup>1</sup> University of Bologna<sup>2</sup>

<div class="alert alert-info"> <h2 align="center">

Active Stereo Without Pattern Projector (ICCV 2023)<br>

Project Page | Paper | Supplementary | Poster

</h2> <h2 align="center">

Stereo-Depth Fusion through Virtual Pattern Projection (Journal Extension)<br>

Project Page | Paper

</h2>

Note: šŸš§ Kindly note that this repository is currently in the development phase. We are actively working to add and refine features and documentation. We apologize for any inconvenience caused by incomplete or missing elements and appreciate your patience as we work towards completion.

:bookmark_tabs: Table of Contents

</div>

:clapper: Introduction

This paper proposes a novel framework integrating the principles of active stereo in standard passive camera systems without a physical pattern projector. We virtually project a pattern over the left and right images according to the sparse measurements obtained from a depth sensor.

<h4 align="center"> </h4> <img src="./images/framework_new.jpg" alt="Alt text" style="width: 800px;" title="architecture">

Contributions:

Extension Contributions:

:fountain_pen: If you find this code useful in your research, please cite:

@InProceedings{Bartolomei_2023_ICCV,
    author    = {Bartolomei, Luca and Poggi, Matteo and Tosi, Fabio and Conti, Andrea and Mattoccia, Stefano},
    title     = {Active Stereo Without Pattern Projector},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2023},
    pages     = {18470-18482}
}
@misc{bartolomei2024stereodepth,
      title={Stereo-Depth Fusion through Virtual Pattern Projection}, 
      author={Luca Bartolomei and Matteo Poggi and Fabio Tosi and Andrea Conti and Stefano Mattoccia},
      year={2024},
      eprint={2406.04345},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

:movie_camera: Watch Our Research Video!

<a href="https://vppstereo.github.io/#myvideo"> <img src="./images/slide_title.jpg" alt="Watch the video" width="800"> </a>

:inbox_tray: Pretrained Models

Here, you can download the weights of RAFT-Stereo and PSMNet architectures.

To use these weights, please follow these steps:

  1. Install GDown python package: pip install gdown
  2. Download all weights from our drive: gdown --folder https://drive.google.com/drive/folders/1GqcY-Z-gtWHqDVMx-31uxrPzprM38UJl?usp=drive_link

:memo: Code

The Test section provides scripts to evaluate disparity estimation models on datasets like KITTI, Middlebury, and ETH3D. It helps assess the accuracy of the models and saves predicted disparity maps.

Please refer to each section for detailed instructions on setup and execution.

<div class="alert alert-info">

Warning:

</div>

:hammer_and_wrench: Setup Instructions

  1. Dependencies: Ensure that you have installed all the necessary dependencies. The list of dependencies can be found in the ./requirements.txt file.
  2. Build rSGM:

:floppy_disk: Datasets

We used seven datasets for training and evaluation.

Middlebury

Midd-14: We used the MiddEval3 training split for evaluation and fine-tuning purposes.

$ cd PATH_TO_DOWNLOAD
$ wget https://vision.middlebury.edu/stereo/submit3/zip/MiddEval3-data-F.zip
$ wget https://vision.middlebury.edu/stereo/submit3/zip/MiddEval3-GT0-F.zip
$ unzip \*.zip

After that, you will get a data structure as follows:

MiddEval3
ā”œā”€ā”€ TrainingF
ā”‚    ā”œā”€ā”€ Adirondack
ā”‚    ā”‚    ā”œā”€ā”€ im0.png
ā”‚    ā”‚    ā””ā”€ā”€ ...
|    ...
|    ā””ā”€ā”€ Vintage
ā”‚         ā””ā”€ā”€ ...
ā””ā”€ā”€ TestF
     ā””ā”€ā”€ ...

Midd-A: We used the Scenes2014 additional split for evaluation and grid-search purposes.

$ cd PATH_TO_DOWNLOAD
$ wget https://vision.middlebury.edu/stereo/data/scenes2014/zip/Backpack-perfect.zip
$ wget https://vision.middlebury.edu/stereo/data/scenes2014/zip/Bicycle1-perfect.zip
$ wget https://vision.middlebury.edu/stereo/data/scenes2014/zip/Cable-perfect.zip
$ wget https://vision.middlebury.edu/stereo/data/scenes2014/zip/Classroom1-perfect.zip
$ wget https://vision.middlebury.edu/stereo/data/scenes2014/zip/Couch-perfect.zip
$ wget https://vision.middlebury.edu/stereo/data/scenes2014/zip/Flowers-perfect.zip
$ wget https://vision.middlebury.edu/stereo/data/scenes2014/zip/Mask-perfect.zip
$ wget https://vision.middlebury.edu/stereo/data/scenes2014/zip/Shopvac-perfect.zip
$ wget https://vision.middlebury.edu/stereo/data/scenes2014/zip/Sticks-perfect.zip
$ wget https://vision.middlebury.edu/stereo/data/scenes2014/zip/Storage-perfect.zip
$ wget https://vision.middlebury.edu/stereo/data/scenes2014/zip/Sword1-perfect.zip
$ wget https://vision.middlebury.edu/stereo/data/scenes2014/zip/Sword2-perfect.zip
$ wget https://vision.middlebury.edu/stereo/data/scenes2014/zip/Umbrella-perfect.zip
$ unzip \*.zip

After that, you will get a data structure as follows:

middlebury2014
ā”œā”€ā”€ Backpack-perfect
ā”‚    ā”œā”€ā”€ im0.png
ā”‚    ā””ā”€ā”€ ...
...
ā””ā”€ā”€ Umbrella-perfect
     ā””ā”€ā”€ ...

Midd-21: We used the Scenes2021 split for evaluation purposes.

$ cd PATH_TO_DOWNLOAD
$ wget https://vision.middlebury.edu/stereo/data/scenes2021/zip/all.zip
$ unzip all.zip
$ mv data/* .

After that, you will get a data structure as follows:

middlebury2021
ā”œā”€ā”€ artroom1
ā”‚    ā”œā”€ā”€ im0.png
ā”‚    ā””ā”€ā”€ ...
...
ā””ā”€ā”€ traproom2
     ā””ā”€ā”€ ...

Note that additional datasets are available at the official website.

KITTI142

We based our KITTI142 validation split from KITTI141 (we added frame 000124). You can download it from our drive using this script:

$ cd PATH_TO_DOWNLOAD
$ gdown --fuzzy https://drive.google.com/file/d/1A14EMqcGLDhH3nTHTVFpSP2P7We0SY-C/view?usp=drive_link
$ unzip kitti142.zip

After that, you will get a data structure as follows:

kitti142
ā”œā”€ā”€ image_2
ā”‚    ā”œā”€ā”€ 000002_10.png
|    ...
ā”‚    ā””ā”€ā”€ 000199_10.png
ā”œā”€ā”€ image_3
ā”‚    ā”œā”€ā”€ 000002_10.png
|    ...
ā”‚    ā””ā”€ā”€ 000199_10.png
ā”œā”€ā”€ lidar_disp_2
ā”‚    ā”œā”€ā”€ 000002_10.png
|    ...
ā”‚    ā””ā”€ā”€ 000199_10.png
ā”œā”€ā”€ disp_occ
ā”‚    ā”œā”€ā”€ 000002_10.png
|    ...
ā”‚    ā””ā”€ā”€ 000199_10.png
...

Note that additional information are available at the official website.

ETH3D

You can download ETH3D dataset following this script:

$ cd PATH_TO_DOWNLOAD
$ wget https://www.eth3d.net/data/two_view_training.7z
$ wget https://www.eth3d.net/data/two_view_training_gt.7z
$ p7zip -d *.7z

After that, you will get a data structure as follows:

eth3d
ā”œā”€ā”€ delivery_area_1l
ā”‚    ā”œā”€ā”€ im0.png
ā”‚    ā””ā”€ā”€ ...
...
ā””ā”€ā”€ terrains_2s
     ā””ā”€ā”€ ...

Note that the script erases 7z files. Further details are available at the official website.

DSEC

We provide preprocessed DSEC testing splits Day, Afternoon and Night:

$ cd PATH_TO_DOWNLOAD
$ gdown --folder https://drive.google.com/drive/folders/1etkvdntDfMdwvx_NP0_QJcUcsogLXYK7?usp=drive_link
$ cd dsec
$ unzip -o \*.zip
$ cd ..
$ mv dsec/* .
$ rmdir dsec

After that, you will get a data structure as follows:

dsec
ā”œā”€ā”€ afternoon
ā”‚    ā”œā”€ā”€ left
|    |    ā”œā”€ā”€ 000000.png
|    |    ...
ā”‚    ā””ā”€ā”€ ...
...
ā””ā”€ā”€ night
     ā””ā”€ā”€ ...

We managed to extract the splits using only data from the official website. We used FasterLIO to de-skew raw LiDAR scans and Open3D to perform ICP registration.

M3ED

We provide preprocessed M3ED testing splits Outdoor Day, Outdoor Night and Indoor:

$ cd PATH_TO_DOWNLOAD
$ gdown --folder https://drive.google.com/drive/folders/1n-7H11ZfbPcR9_F0Ri2CcTJS2WWQlfCo?usp=drive_link
$ cd m3ed
$ unzip -o \*.zip
$ cd ..
$ mv m3ed/* .
$ rmdir m3ed

After that, you will get a data structure as follows:

m3ed
ā”œā”€ā”€ indoor
ā”‚    ā”œā”€ā”€ left
|    |    ā”œā”€ā”€ 000000.png
|    |    ...
ā”‚    ā””ā”€ā”€ ...
...
ā””ā”€ā”€ night
     ā””ā”€ā”€ ...

We managed to extract the splits using only data from the official website.

M3ED Active

We provide preprocessed M3ED Active testing splits Passive, and Active:

$ cd PATH_TO_DOWNLOAD
$ gdown --folder https://drive.google.com/drive/folders/1fv6f2mQUPW8MwSsGy1f0dEHOZCS4sk2-?usp=drive_link
$ cd m3ed_active
$ unzip -o \*.zip
$ cd ..
$ mv m3ed_active/* .
$ rmdir m3ed_active

After that, you will get a data structure as follows:

m3ed_active
ā”œā”€ā”€ passive
ā”‚    ā”œā”€ā”€ left
|    |    ā”œā”€ā”€ 000000.png
|    |    ...
ā”‚    ā””ā”€ā”€ ...
ā””ā”€ā”€ active
     ā””ā”€ā”€ ...

We managed to extract the splits using only data from the official website.

SIMSTEREO

You can download SIMSTEREO dataset here.

After that, you will get a data structure as follows:

simstereo
ā”œā”€ā”€ test
ā”‚    ā”œā”€ā”€ nirColormanaged
|    |    ā”œā”€ā”€ abstract_bowls_1_left.jpg
|    |    ā”œā”€ā”€ abstract_bowls_1_right.jpg
|    |    ...
ā”‚    ā”œā”€ā”€ rgbColormanaged
|    |    ā”œā”€ā”€ abstract_bowls_1_left.jpg
|    |    ā”œā”€ā”€ abstract_bowls_1_right.jpg
|    |    ...
ā”‚    ā””ā”€ā”€ pfmDisp
|         ā”œā”€ā”€ abstract_bowls_1_left.pfm
|         ā”œā”€ā”€ abstract_bowls_1_right.pfm
|         ...
ā””ā”€ā”€ training
     ā””ā”€ā”€ ...

:rocket: Test

This code snippet allows you to evaluate the disparity maps on various datasets, including KITTI (142 split), Middlebury (Training, Additional, 2021), ETH3D, DSEC, M3ED, and SIMSTEREO. By executing the provided script, you can assess the accuracy of disparity estimation models on these datasets.

To run the test.py script with the correct arguments, follow the instructions below:

  1. Run the test:

    • Open a terminal or command prompt.
    • Navigate to the directory containing the test.py script.
  2. Execute the command: Run the following command, replacing the placeholders with the actual values for your images and model:

    # Parameters to reproduce Active Stereo Without Pattern Projector (ICCV 2023)
    CUDA_VISIBLE_DEVICES=0 python test.py  --datapath <path_to_dataset> --dataset <dataset_type> --stereomodel <model_name> \
     --loadstereomodel <path_to_pretrained_model> --maxdisp 192 \
     --vpp --outdir <save_dmap_dir> --wsize 3 --guideperc 0.05 --blending 0.4 --iscale <input_image_scale> \
     --maskocc
    
    # Parameters to reproduce Stereo-Depth Fusion through Virtual Pattern Projection (Journal Extension)
    CUDA_VISIBLE_DEVICES=0 python test.py  --datapath <path_to_dataset> --dataset <dataset_type> --stereomodel <model_name> \
     --loadstereomodel <path_to_pretrained_model> --maxdisp 192 \
     --vpp --outdir <save_dmap_dir> --wsize 7 --guideperc 0.05 --blending 0.4 --iscale <input_image_scale> \
     --maskocc --bilateralpatch --bilateral_spatial_variance 1 --bilateral_color_variance 2 --bilateral_threshold 0.001 --rsgm_subpixel
    

Replace the placeholders (<max_disparity>, <path_to_dataset>, <dataset_type>, etc.) with the actual values for your setup.

The available arguments are:

For more details, please refer to the test.py script.

:art: Qualitative Results

In this section, we present illustrative examples that demonstrate the effectiveness of our proposal.

<br> <p float="left"> <img src="./images/competitors.png" width="800" /> </p>

Performance against competitors. We can notice that VPP generally reaches almost optimal performance with a meagre 1% density and, except few cases in the -tr configurations with some higher density, achieves much lower error rates.

<br> <p float="left"> <img src="./images/vpp_ots.png" width="800" /> </p>

VPP with off-the-shelf networks. We collects the results yielded VPP applied to several off-the-shelf stereo models, by running the weights provided by the authors. Again, VPP sensibly boosts the accuracy of any model with rare exceptions, either trained on synthetic or real data.

<br> <p float="left"> <img src="./images/teaser.png" width="800" /> </p>

Qualitative Comparison on KITTI (top) and Middlebury (bottom). From left to right: vanilla left images and disparity maps by PSMNet model, left images enhanced by our virtual projection and disparity maps by vanilla PSMNet model and (most right) vpp fine tuned PSMNet model.

<br> <p float="left"> <img src="./images/fine_details.png" width="800" /> </p>

Fine-details preservation: We can appreciate how our virtual pattern can greatly enhance the quality of the disparity maps, without introducing relevant artefacts in correspondence of thin structures ā€“ despite applying the pattern on patches.

:envelope: Contacts

For questions, please send an email to luca.bartolomei5@unibo.it

:pray: Acknowledgements

We would like to extend our sincere appreciation to the authors of the following projects for making their code available, which we have utilized in our work:

We deeply appreciate the authors of the competing research papers for provision of code and model weights, which greatly aided accurate comparisons.

<h5 align="center">Patent pending - University of Bologna</h5>