Home

Awesome

MVS2D: Efficient Multi-view Stereo via Attention-Driven 2D Convolutions

Project Page | Paper

<br/> <img src="demo/network.png" alt="drawing" width="800"/>

If you find our work useful for your research, please consider citing our paper:

@article{DBLP:journals/corr/abs-2104-13325,
  author    = {Zhenpei Yang and
               Zhile Ren and
               Qi Shan and
               Qixing Huang},
  title     = {{MVS2D:} Efficient Multi-view Stereo via Attention-Driven 2D Convolutions},
  journal   = {CoRR},
  volume    = {abs/2104.13325},
  year      = {2021},
  url       = {https://arxiv.org/abs/2104.13325},
  eprinttype = {arXiv},
  eprint    = {2104.13325},
  timestamp = {Tue, 04 May 2021 15:12:43 +0200},
  biburl    = {https://dblp.org/rec/journals/corr/abs-2104-13325.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

✏️ Changelog

Nov 27 2021

⚙️ Installation

<details> <summary>Click to expand </summary>

The code is tested with CUDA10.1. Please use following commands to install dependencies:

conda create --name mvs2d python=3.7
conda activate mvs2d

pip install -r requirements.txt

The folder structure should looks like the following if you have downloaded all data and pretrained models. Download links are inside each dataset tab at the end of this README.

.
├── configs
├── datasets
├── demo
├── networks
├── scripts
├── pretrained_model
│   ├── demon
│   ├── dtu
│   └── scannet
├── data
│   ├── DeMoN
│   ├── DTU_hr
│   ├── SampleSet
│   ├── ScanNet
│   └── ScanNet_3_frame_jitter_pose.npy
├── splits
│   ├── DeMoN_samples_test_2_frame.npy
│   ├── DeMoN_samples_train_2_frame.npy
│   ├── ScanNet_3_frame_test.npy
│   ├── ScanNet_3_frame_train.npy
│   └── ScanNet_3_frame_val.npy
</details>

🎬 Demo

<details> <summary>Click to expand </summary>

After downloading the pretrained models for ScanNet, try to run following command to make a prediction on a sample data.

python demo.py --cfg configs/scannet/release.conf

The results are saved as demo.png

</details>

⏳ Training & Testing

We use 4 Nvidia V100 GPU for training. You may need to modify 'CUDA_VISIBLE_DEVICES' and batch size to accomodate your GPU resources.

ScanNet

<details> <summary>Click to expand </summary>

Download

data 🔗 split 🔗 pretrained models 🔗 noisy pose 🔗

Training

First download and extract ScanNet training data and split. Then run following command to train our model.

bash scripts/scannet/train.sh

To train the multi-scale attention model, add --robust 1 to the training command in scripts/scannet/train.sh.

To train our model with noisy input pose, add --perturb_pose 1 to the training command in scripts/scannet/train.sh.

Testing

First download and extract data, split and pretrained models.

Then run:

bash scripts/scannet/test.sh

You should get something like these:

abs_relsq_rellog10rmsermse_loga1a2a3abs_diffabs_diff_medianthre1thre3thre5
0.0590.0160.0260.1570.0840.9640.9950.9990.1080.0790.8560.9740.996
</details>

SUN3D/RGBD/Scenes11

<details> <summary>Click to expand </summary>

Download

data 🔗 split 🔗 pretrained models 🔗

Training

First download and extract DeMoN training data and split. Then run following command to train our model.

bash scripts/demon/train.sh

Testing

First download and extract data, split and pretrained models.

Then run:

bash scripts/demon/test.sh

You should get something like these:

dataset rgbd: 160

abs_relsq_rellog10rmsermse_loga1a2a3abs_diffabs_diff_medianthre1thre3thre5
0.0820.1650.0470.4400.1470.9210.9390.9480.3250.2840.7530.8940.933

dataset scenes11: 256

abs_relsq_rellog10rmsermse_loga1a2a3abs_diffabs_diff_medianthre1thre3thre5
0.0460.0800.0180.4390.1070.9760.9890.9930.1550.0580.8220.9450.979

dataset sun3d: 160

abs_relsq_rellog10rmsermse_loga1a2a3abs_diffabs_diff_medianthre1thre3thre5
0.0990.0550.0440.3040.1370.8930.9700.9930.2240.1710.6490.8900.969

-> Done!

depth

abs_relsq_rellog10rmsermse_loga1a2a3abs_diffabs_diff_medianthre1thre3thre5
0.0710.0960.0330.4020.1270.9380.9700.9810.2220.1520.7550.9150.963
</details>

DTU

<details> <summary>Click to expand </summary>

Download

data 🔗 eval data 🔗 eval toolkit 🔗 pretrained models 🔗

Training

First download and extract DTU training data. Then run following command to train our model.

bash scripts/dtu/test.sh

Testing

First download and extract DTU eval data and pretrained models.

The following command performs three steps together: 1. Generate depth prediction on DTU test set. 2. Fuse depth predictions into final point cloud. 3. Evaluate predicted point cloud. Note that we re-implement the original Matlab Evaluation of DTU dataset using python.

bash scripts/dtu/test.sh

You should get something like these:

Acc 0.4051747996189477 <br /> Comp 0.2776021161518006 <br /> F-score 0.34138845788537414

Acknowledgement

The fusion code for DTU dataset is heavily built upon from PatchMatchNet

</details>