Home

Awesome

Awesome-Deep-Stereo-Matching Awesome

Welcome to the "Awesome-Deep-Stereo-Matching" repository, a curated list of state-of-the-art deep stereo matching resources maintained by Fabio Tosi, Matteo Poggi and Luca Bartolomei, from the University of Bologna. This repository, inspired by awesome-computer-vision, aims to provide a comprehensive collection of the latest and most influential papers on deep stereo matching published in top-tier computer vision conferences and prestigious journals.

The methods included in this repository are appropriately categorized to facilitate navigation and understanding of the diverse approaches and techniques employed in deep stereo matching research. Additionally, for anyone in need, we also release the reference bib which contains the bib entries for all the works included in this page.

We use the :triangular_flag_on_post: symbol to highlight the absolute most groundbreaking works.

:rotating_light: :rotating_light: :rotating_light: This repository is closely associated with our surveys on deep stereo matching:

  1. "A Survey on Deep Stereo Matching in the Twenties", Tosi et al., 2024
  2. "On the Synergies between Machine Learning and Binocular Stereo for Depth Estimation from Images: a Survey", Poggi et al., 2021
  3. "On the confidence of stereo matching in a deep-learning era: a quantitative evaluation", Poggi et al., 2022

These surveys provides an in-depth overview of the field, complementing the curated list of resources found in this repository.

Additionally, we presented a tutorial on this topic at CVPR 2024. For more information about the tutorial, including slides and additional resources, please visit our Tutorial Webpage.

If you find this repository valuable, please consider citing it in your work and giving it a star ! :star:

Full reference(s):

How to submit a pull request?

:bookmark_tabs: Table of Contents

  1. Survey & Fundamentals
  2. CodeBase
  3. Datasets
  4. Frameworks
  5. Applications
  6. Workshops
  7. Tutorials & Talks
  8. Demos
  9. Citation
<h2 id="fundamentals"> Survey & Fundamentals </h2> <details open><summary style="font-size: larger; font-weight: bold;"> Stereo Matching Basics</summary><ul> </details> <details open><summary style="font-size: larger; font-weight: bold;"> Deep Stereo Matching</summary><ul> </ul> </details> <details open><summary style="font-size: larger; font-weight: bold;"> Learned Confidence Estimation </summary><ul> </ul> </details>

CodeBase

:file_cabinet: Datasets

<details open id="real-world"><summary style="font-size: larger; font-weight: bold;">Real-World</summary><ul> <details open id="real-world RGB"><summary style="font-size: larger; font-weight: bold;"> RGB </summary> </details> <details open><summary style="font-size: larger; font-weight: bold;"> Multimodal/Beyond-Visible </summary> </details> <details open><summary style="font-size: larger; font-weight: bold;"> Rendered </summary> </details> </ul> </details> <details open id="synthetic"> <summary style="font-size: larger; font-weight: bold;">Synthetic</summary> </details>

Frameworks

Learning for Stereo Pipeline

<details open id="matching-cost"> <summary style="font-size: larger; font-weight: bold;">Matching Cost</summary> </details> <details open id="optimization"> <summary style="font-size: larger; font-weight: bold;">Optimization</summary> </details> <details open id="refinement"> <summary style="font-size: larger; font-weight: bold;">Refinement</summary> </details>

End-to-End Architectures

<details open id="foundational"> <summary style="font-size: larger; font-weight: bold;">Foundational Deep Stereo Architectures</summary><ul> <details open> <summary style="font-size: larger; font-weight: bold;">CNN-based Cost Volume Aggregation</summary><ul> <details open> <summary style="font-size: larger; font-weight: bold;">2D Architectures</summary> </details> <details open class="nested-details"> <summary style="font-size: larger; font-weight: bold;">3D Architectures</summary> </details> </details> <details open class="nested-details"> <summary style="font-size: larger; font-weight: bold;">Neural Architecture Search (NAS)</summary> </details> <details open class="nested-details"> <summary style="font-size: larger; font-weight: bold;">Iterative Optimized-based Architectures</summary> </details> <details open class="nested-details"> <summary style="font-size: larger; font-weight: bold;">Transformer-based Architectures</summary> </details> <details open class="nested-details"> <summary style="font-size: larger; font-weight: bold;">Markov Random Field-based Architectures</summary> </details> </ul> </details> <details open id="efficient-oriented"> <summary style="font-size: larger; font-weight: bold;">Efficient-Oriented Deep Stereo Architectures</summary><ul> <details open> <summary style="font-size: larger; font-weight: bold;">Compact Cost Volume Representation</summary> </details> <details open> <summary style="font-size: larger; font-weight: bold;">Efficient Cost Volume Processing</summary> </details> <details open> <summary style="font-size: larger; font-weight: bold;">Efficient Inference Schemes</summary> </details> <details open> <summary style="font-size: larger; font-weight: bold;">Lightweight Network Architecture Design</summary> </details> </ul> </details> <details open id="multi-task"> <summary style="font-size: larger; font-weight: bold;">Multi-Task Deep Stereo Architectures</summary><ul> <details open> <summary style="font-size: larger; font-weight: bold;">Normal-Assisted Stereo Matching</summary> </details> <details open> <summary style="font-size: larger; font-weight: bold;">Joint Stereo Matching and Optical Flow</summary> </details> <details open> <summary style="font-size: larger; font-weight: bold;">Joint Stereo Matching and Semantic Segmentation</summary> </details> <details open> <summary style="font-size: larger; font-weight: bold;">Joint Stereo Matching and Uncertainty</summary> </details> <details open> <summary style="font-size: larger; font-weight: bold;"> Scene Flow </summary> </details> </details> <details open id="multi-modal"> <summary style="font-size: larger; font-weight: bold;">Beyond Visual Spectrum Deep Stereo Architectures</summary><ul> <details open> <summary style="font-size: larger; font-weight: bold;">Depth-Guided Sensor Stereo Networks</summary> </details> <details open> <summary style="font-size: larger; font-weight: bold;">Pattern Projection-Based Stereo Networks</summary> </details> <details open> <summary style="font-size: larger; font-weight: bold;">Cross-Spectral Stereo Networks</summary> </details> <details open> <summary style="font-size: larger; font-weight: bold;">Event Stereo Networks</summary> </details> <details open> <summary style="font-size: larger; font-weight: bold;">Gated Stereo Networks</summary> </details> <details open> <summary style="font-size: larger; font-weight: bold;">Stereo Networks with Echoes </summary> </details> </ul> </details>

Architectural Analysis

Challenges & Solutions

<details open id="over-smoothing"> <summary style="font-size: larger; font-weight: bold;">Addressing the Over-Smoothing Issue</summary> </details> <details open id="missing-gt"> <summary style="font-size: larger; font-weight: bold;">Missing Ground Truth Depth</summary><ul> <details open> <summary style="font-size: larger; font-weight: bold;">Self-Supervised</summary> </details> <details open > <summary style="font-size: larger; font-weight: bold;">Cross-Framework/Proxy Supervision</summary> </details> </ul> </details> <details open id="domain-shift"> <summary style="font-size: larger; font-weight: bold;">Domain Shift</summary><ul> <details open> <summary style="font-size: larger; font-weight: bold;">Zero-shot Generalization</summary><ul> <details open> <summary style="font-size: larger; font-weight: bold;">Domain-Agnostic Feature Modeling</summary> </details> <details open> <summary style="font-size: larger; font-weight: bold;">Non-parametric Cost Volumes</summary> </details> <details open> <summary style="font-size: larger; font-weight: bold;">Integration of Additional Geometric Cues</summary> </details> <details open> <summary style="font-size: larger; font-weight: bold;">Real-World Monocular to Synthetic Stereo Data</summary> </details> <details open> <summary style="font-size: larger; font-weight: bold;">Knowledge Transfer</summary> </details> <details open> <summary style="font-size: larger; font-weight: bold;">Data Augmentation Analysis </summary> </details> </ul> </details> <details open> <summary style="font-size: larger; font-weight: bold;">Offline Adaptation</summary> </details> <details open> <summary style="font-size: larger; font-weight: bold;">Online Continual Adaptation</summary> </details> </details> <details open id="adverse-weather"> <summary style="font-size: larger; font-weight: bold;">Adverse Weather</summary> </details> <details open id="tom"> <summary style="font-size: larger; font-weight: bold;">Transparent and Reflective (ToM) Surfaces </summary> </details> <details open id="asymmetric"> <summary style="font-size: larger; font-weight: bold;">Asymmetric Stereo </summary> </details> <details open id="temporal-consistency"> <summary style="font-size: larger; font-weight: bold;">Temporal Consistency</summary> </details> <details open id="continuous-estimation"> <summary style="font-size: larger; font-weight: bold;">Continuous Estimation Problem</summary> </details> </ul> </details>

Confidence Estimation

<details open> <summary style="font-size: larger; font-weight: bold;">Machine Learning Approaches</summary><ul> <details open> <summary style="font-size: larger; font-weight: bold;">Disparity-based</summary> </details> <details open> <summary style="font-size: larger; font-weight: bold;">Cost Volume-based</summary> </details> <details open> <summary style="font-size: larger; font-weight: bold;">Model-based</summary> </details> <details open> <summary style="font-size: larger; font-weight: bold;">SGM-specific</summary> </details> </ul> </details> <details open> <summary style="font-size: larger; font-weight: bold;">Deep Learning Approaches</summary><ul> <details open> <summary style="font-size: larger; font-weight: bold;">Disparity-based</summary> </details> <details open> <summary style="font-size: larger; font-weight: bold;">Cost Volume-based</summary> </details> <details open> <summary style="font-size: larger; font-weight: bold;">Multiple Confidence Fusion</summary> </details> <details open> <summary style="font-size: larger; font-weight: bold;">Sensor-based</summary> </ul> </details>

Applications

(Not an exhaustive list)

Workshops

<h2 id="tutorials-talks"> Tutorials & Talks </h2> <h2 id="demos"> Demos </h2>

:fountain_pen: Citation

Please consider citing this list if you find this repository useful:

@article{tosi2024survey,
  title={A Survey on Deep Stereo Matching in the Twenties},
  author={Fabio Tosi and Luca Bartolomei and Matteo Poggi},
  journal={arXiv preprint arXiv:2407.07816},
  year={2024},
  url={https://arxiv.org/abs/2407.07816},
  note={Extended version of CVPR 2024 Tutorial "Deep Stereo Matching in the Twenties" (https://sites.google.com/view/stereo-twenties)},
}
@article{poggi2021synergies,
  title={On the synergies between machine learning and binocular stereo for depth estimation from images: a survey},
  author={Poggi, Matteo and Tosi, Fabio and Batsos, Konstantinos and Mordohai, Philippos and Mattoccia, Stefano},
  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
  volume={44},
  number={9},
  pages={5314--5334},
  year={2021},
  publisher={IEEE}
}
@article{poggi2021confidence,
  title={On the confidence of stereo matching in a deep-learning era: a quantitative evaluation},
  author={Poggi, Matteo and Kim, Seungryong and Tosi, Fabio and Kim, Sunok and Aleotti, Filippo and Min, Dongbo and Sohn, Kwanghoon and Mattoccia, Stefano},
  journal={IEEE transactions on pattern analysis and machine intelligence},
  volume={44},
  number={9},
  pages={5293--5313},
  year={2021},
  publisher={IEEE}
}