Home

Awesome

<h1 align="center"> LiDAR-Event Stereo Fusion with Hallucinations (ECCV 2024) </h1> <br>

:rotating_light: This repository contains download links to our code, and trained deep stereo models of our work "LiDAR-Event Stereo Fusion with Hallucinations", ECCV 2024

by Luca Bartolomei<sup>1,2</sup>, Matteo Poggi<sup>2</sup>, Andrea Conti<sup>2</sup>, and Stefano Mattoccia<sup>1,2</sup>

Advanced Research Center on Electronic System (ARCES)<sup>1</sup> University of Bologna<sup>2</sup>

<div class="alert alert-info"> <h2 align="center">

LiDAR-Event Stereo Fusion with Hallucinations (ECCV 2024)<br>

Project Page | Paper

</h2>

Note: šŸš§ Kindly note that this repository is currently in the development phase. We are actively working to add and refine features and documentation. We apologize for any inconvenience caused by incomplete or missing elements and appreciate your patience as we work towards completion.

:bookmark_tabs: Table of Contents

</div>

:clapper: Introduction

This pioneering paper proposes a solution to the issues caused by the absence of motion or the presence of large untextured regions in an event-based stereo matching setup.

Given that event cameras provide rich cues at object boundaries and active sensors can measure depth where the lack of texture makes event cameras uninformative, we took inspiration from our previous work Active Stereo Without Pattern Projector (ICCV 2023) to integrating a stereo event camera with an active sensor -- e.g., a LiDAR.

Inserting fake events inside the event stacks of the event-stereo network (VSH) or even directly inside the event stereo stream (BTH), we managed to alleviate the aforementioned issues.

<img src="./images/framework.jpg" alt="Alt text" style="width: 800px;" title="architecture"> <p style="text-align: justify;"><strong>Overview of a generic event-based stereo network and our hallucination strategies.</strong> State-of-the-art event-stereo frameworks (a) pre-process raw events to obtain event stacks fed to a deep network. In case the stacks are accessible, we define the model as a gray box, otherwise as a black box. In the former case (b), we can hallucinate patterns directly on it (VSH). When dealing with a black box (c), we can hallucinate raw events that will be processed to obtain the stacks (BTH).</p>

Contributions:

:fountain_pen: If you find this code useful in your research, please cite:

@inproceedings{bartolomei2024lidar,
  title={LiDAR-Event Stereo Fusion with Hallucinations},
  author={Bartolomei, Luca and Poggi, Matteo and Conti, Andrea and Mattoccia, Stefano},
  booktitle={European Conference on Computer Vision (ECCV)},
  year={2024},
}

:inbox_tray: Pretrained Models

Here, you can download the weights of the baseline architecture trained on DSEC with eight different stacking representations.

To use these weights, please follow these steps:

  1. Install GDown python package: pip install gdown
  2. Download all weights from our drive : gdown --folder https://drive.google.com/drive/folders/1wh2m2LB9DRmBCJ_scHy5Wbq6nstyxCg1?usp=sharing

:memo: Code

The Test section provides scripts to evaluate disparity estimation models on DSEC and M3ED datasets. It helps assess the accuracy of the models and saves predicted disparity maps.

Please refer to each section for detailed instructions on setup and execution.

<div class="alert alert-info">

Warning:

</div>

:hammer_and_wrench: Setup Instructions

  1. Dependencies: Ensure that you have installed all the necessary dependencies. The list of dependencies can be found in the ./requirements.txt file.
  2. Build deform_conv:
    • Activate your virtual env
    • cd ./src/components/models/baseline/deform_conv/
    • setup.py build_ext --inplace

:floppy_disk: Datasets

We used two datasets for training and evaluation.

DSEC

Download DSEC (train_events.zip train_disparity.zip and train_calibration.zip) and extract them.

Next, you have to download in the same DSEC folder our preprocessed DSEC raw LiDAR scans:

cd PATH_TO_DSEC
gdown https://drive.google.com/file/d/1iYApCcGuk8RIL9aLchDDnvK4fe3qcTRk/view?usp=sharing
unzip dsec_raw.zip

After that, you will get a data structure as follows:

dsec
ā”œā”€ā”€ train
ā”‚Ā Ā  ā”œā”€ā”€ interlaken_00_c
ā”‚Ā Ā  ā”‚Ā Ā  ā”œā”€ā”€ calibration
ā”‚Ā Ā  ā”‚   ā”‚Ā Ā  ā”œā”€ā”€ cam_to_cam.yaml
ā”‚Ā Ā  ā”‚   ā”‚Ā Ā  ā””ā”€ā”€ cam_to_lidar.yaml
ā”‚Ā Ā  ā”‚Ā Ā  ā”œā”€ā”€ disparity
ā”‚Ā Ā  ā”‚Ā Ā  ā”‚Ā Ā  ā”œā”€ā”€ event
ā”‚Ā Ā  ā”‚Ā Ā  ā”‚Ā Ā  ā”‚Ā Ā  ā”œā”€ā”€ 000000.png
ā”‚Ā Ā  ā”‚Ā Ā  ā”‚Ā Ā  ā”‚Ā Ā  ā”œā”€ā”€ ...
ā”‚Ā Ā  ā”‚Ā Ā  ā”‚Ā Ā  ā”‚Ā Ā  ā””ā”€ā”€ 000536.png
ā”‚Ā Ā  ā”‚Ā Ā  ā”‚Ā Ā  ā”œā”€ā”€ raw
ā”‚Ā Ā  ā”‚Ā Ā  ā”‚Ā Ā  ā”‚Ā Ā  ā”œā”€ā”€ 000000.png
ā”‚Ā Ā  ā”‚Ā Ā  ā”‚Ā Ā  ā”‚Ā Ā  ā”œā”€ā”€ ...
ā”‚Ā Ā  ā”‚Ā Ā  ā”‚Ā Ā  ā”‚Ā Ā  ā””ā”€ā”€ 000268.png
ā”‚Ā Ā  ā”‚Ā Ā  ā”‚Ā Ā  ā”œā”€ā”€ raw_mae.txt
ā”‚Ā Ā  ā”‚Ā Ā  ā”‚Ā Ā  ā”œā”€ā”€ raw_mae.png
ā”‚Ā Ā  ā”‚Ā Ā  ā”‚Ā Ā  ā”œā”€ā”€ raw_bad1.txt
ā”‚Ā Ā  ā”‚Ā Ā  ā”‚Ā Ā  ā””ā”€ā”€ timestamps.txt
ā”‚Ā Ā  ā”‚Ā Ā  ā””ā”€ā”€ events
ā”‚Ā Ā  ā”‚Ā Ā   Ā Ā  ā”œā”€ā”€ left
ā”‚Ā Ā  ā”‚Ā Ā   Ā Ā  ā”‚Ā Ā  ā”œā”€ā”€ events.h5
ā”‚Ā Ā  ā”‚Ā Ā   Ā Ā  ā”‚Ā Ā  ā””ā”€ā”€ rectify_map.h5
ā”‚Ā Ā  ā”‚Ā Ā   Ā Ā  ā””ā”€ā”€ right
ā”‚Ā Ā  ā”‚Ā Ā   Ā Ā      ā”œā”€ā”€ events.h5
ā”‚Ā Ā  ā”‚Ā Ā   Ā Ā      ā””ā”€ā”€ rectify_map.h5
ā”‚Ā Ā  ā”œā”€ā”€ ...
ā”‚Ā Ā  ā””ā”€ā”€ zurich_city_11_c                # same structure as train/interlaken_00_c
ā””ā”€ā”€ test
     ā””ā”€ā”€ ...

We managed to extract the raw LiDAR scans using only data from the official website. We used FasterLIO to de-skew raw LiDAR scans and Open3D to perform ICP registration.

The original DSEC license applies to the raw LiDAR files as well.

M3ED

Download M3ED dataset using our modified script (original script):

python src/download_m3ed_val.py --to_download data depth_gt --output_dir PATH_TO_M3ED_H5_FILES

After that, you will get a data structure as follows:

m3ed_h5
ā”œā”€ā”€ car_forest_tree_tunnel
ā”‚Ā Ā  ā”œā”€ā”€ car_forest_tree_tunnel_data.h5
|   ā””ā”€ā”€ car_forest_tree_tunnel_depth_gt.h5
...

Convert H5 files using our preprocessing script:

python src/m3ed_converter.py -i PATH_TO_M3ED_H5_FILES -o PATH_TO_M3ED --max_offset_us 100000

After the conversion, you will get a data structure as follows:

m3ed
ā”œā”€ā”€ car_forest_tree_tunnel
ā”‚Ā Ā  ā”œā”€ā”€ calibration
ā”‚Ā Ā  ā”œā”€ā”€ disparity
|   ā””ā”€ā”€ events
...

The script emulates the data structure of DSEC. However, it additionally adds groundtruth and raw scans with different time offsets to replicate experiments in figure 8 and 9. Check the disparity folders and look for event_raw_{i} with i=2899,13321,32504,61207,100000.

We managed to extract raw LiDAR using only data from the official website.

:rocket: Test

This code snippet allows you to evaluate the disparity maps on DSEC and M3ED datasets. By executing the provided script, you can assess the accuracy of disparity estimation models on these datasets.

We provide six bash script to evaluate results of table 1,2,3,4 and figure 8,9.

To run an evaluation script, follow the istructions below:

  1. Run the test:

    • Open a terminal or command prompt.
    • Navigate to the directory containing this repository.
    • Enable your virtual env with the required libraries.
  2. Execute the command: Each script has parameters that you should set: 1) environment settings: set the path to your virtualenv/conda; 2) set DATA_PATH variable to the dataset path; 3) set WEIGHTS_PATH variable to the path where you downloaded our pretrained checkpoints.

    After that you can launch an evaluation script (for example Tab. 1 evaluation script):

     ./scripts/evaluate_table_1.sh
    

For more details about available arguments, please refer to the inference.py script.

:art: Qualitative Results

In this section, we present illustrative examples that demonstrate the effectiveness of our proposal.

<br> <p float="left"> <img src="./images/qualitative1.jpg" width="800" /> </p>

Performance against competitors -- pre-trained models. On DSEC (top), BTH dramatically improves results over the baseline and Guided, yet cannot fully recover some details in the scene except when retraining the stereo backbone. On M3ED (bottom), both VSH and BTH with pre-trained models reduce the error by 5Ɨ.

<br> <p float="left"> <img src="./images/qualitative2.jpg" width="800" /> </p>

Performance against competitors -- refined models (outdoor). Concat and Guided+Concat can reduce the error by about 40%, yet far behind the improvement yielded by BTH (more than 70% error rate reduction).

<br> <p float="left"> <img src="./images/qualitative3.jpg" width="800" /> </p>

Performance against competitors -- refined models (indoor). our proposal confirms again the best solution for exploiting raw LiDAR measurements and improve the accuracy of event-based stereo networks.

:envelope: Contacts

For questions, please send an email to luca.bartolomei5@unibo.it

:pray: Acknowledgements

We would like to extend our sincere appreciation to the authors of the following projects for making their code available, which we have utilized in our work:

<h5 align="center">Patent pending - University of Bologna</h5>