Awesome
MoCha-Stereo 抹茶算法
[CVPR2024] The official implementation of "MoCha-Stereo: Motif Channel Attention Network for Stereo Matching".
https://github.com/ZYangChen/MoCha-Stereo/assets/108012397/2ed414fe-d182-499b-895c-b5375ef51425
V1 Version
<div align="center"> <a href="https://openaccess.thecvf.com/content/CVPR2024/html/Chen_MoCha-Stereo_Motif_Channel_Attention_Network_for_Stereo_Matching_CVPR_2024_paper.html" target='_blank'><img src="https://img.shields.io/badge/CVPR-2024-9cf?logo="/></a> <a href="https://arxiv.org/pdf/2404.06842.pdf" target='_blank'><img src="https://img.shields.io/badge/Paper-PDF-f5cac3?logo=adobeacrobatreader&logoColor=red"/></a> <a href="https://openaccess.thecvf.com/content/CVPR2024/supplemental/Chen_MoCha-Stereo_Motif_Channel_CVPR_2024_supplemental.pdf" target='_blank'><img src="https://img.shields.io/badge/Supp.-PDF-f5cac3?logo=adobeacrobatreader&logoColor=red"/></a> <a href="https://paperswithcode.com/sota/stereo-disparity-estimation-on-kitti-2015?p=mocha-stereo-motif-channel-attention-network" target='_blank'><img src="https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mocha-stereo-motif-channel-attention-network/stereo-disparity-estimation-on-kitti-2015" /></a> <!--<a href="https://paperswithcode.com/sota/stereo-depth-estimation-on-kitti-2015?p=mocha-stereo-motif-channel-attention-network" target='_blank'><img src="https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mocha-stereo-motif-channel-attention-network/stereo-depth-estimation-on-kitti-2015" /></a>--> </div>MoCha-Stereo: Motif Channel Attention Network for Stereo Matching <br> Ziyang Chen†, Wei Long†, He Yao†, Yongjun Zhang✱,Bingshu Wang, Yongbin Qin, Jia Wu <br> CVPR 2024 <br> Correspondence: ziyangchen2000@gmail.com; zyj6667@126.com✱
@inproceedings{chen2024mocha,
title={MoCha-Stereo: Motif Channel Attention Network for Stereo Matching},
author={Chen, Ziyang and Long, Wei and Yao, He and Zhang, Yongjun and Wang, Bingshu and Qin, Yongbin and Wu, Jia},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={27768--27777},
year={2024}
}
Requirements
Python = 3.8
CUDA = 11.3
conda create -n mocha python=3.8
conda activate mocha
pip install torch==1.12.0+cu113 torchvision==0.13.0+cu113 torchaudio==0.12.0 --extra-index-url https://download.pytorch.org/whl/cu113
The following libraries are also required
tqdm
tensorboard
opt_einsum
einops
scipy
imageio
opencv-python-headless
scikit-image
timm
six
Dataset
To evaluate/train RAFT-stereo, you will need to download the required datasets.
- Sceneflow (Includes FlyingThings3D, Driving, Monkaa)
- Middlebury
- ETH3D
- KITTI
By default stereo_datasets.py
will search for the datasets in these locations. You can create symbolic links to wherever the datasets were downloaded in the datasets
folder
├── datasets
├── FlyingThings3D
├── frames_cleanpass
├── frames_finalpass
├── disparity
├── Monkaa
├── frames_cleanpass
├── frames_finalpass
├── disparity
├── Driving
├── frames_cleanpass
├── frames_finalpass
├── disparity
├── KITTI
├── KITTI_2015
├── testing
├── training
├── KITTI_2012
├── testing
├── training
├── Middlebury
├── MiddEval3
├── ETH3D
├── two_view_training
├── two_view_training_gt
├── two_view_testing
Training
python train_stereo.py --batch_size 8 --mixed_precision
Evaluation
To evaluate a trained model on a validation set (e.g. Middlebury full resolution), run
python evaluate_stereo.py --restore_ckpt models/mocha-stereo.pth --dataset middlebury_F
Weight is available here.
FAQ
Q1. Weight for "tf_efficientnetv2_l"?
A1: Please refer to issue #6 "关于tf_efficientnetv2_l检查点的问题", #8 "预训练权重", and #9 "code error".
Todo List
- [CVPR2024] V1 version
- Paper
- Code of MoCha-Stereo
- V2 version
- Preprint manuscript
- Code of MoCha-V2