Home

Awesome

Multiview Stereo with Cascaded Epipolar RAFT (CER-MVS)

This repository contains the source code for our ECCV 2022 paper:

Multiview Stereo with Cascaded Epipolar RAFT

Zeyu Ma, Zachary Teed and Jia Deng

@inproceedings{ma2022multiview,
  title={Multiview Stereo with Cascaded Epipolar RAFT},
  author={Ma, Zeyu and Teed, Zachary and Deng, Jia},
  booktitle={Proceedings of the European conference on computer vision (ECCV)},
  year={2022}
}

Requirements

The code has been tested with PyTorch 1.7 and Cuda 11.0.

conda env create -f environment.yml
conda activate cer-mvs

# we use gcc9 to compile alt_cuda_corr
export TORCH_CUDA_ARCH_LIST="6.0;6.1;6.2;7.0;7.5;8.0"
cd alt_cuda_corr && python setup.py install && cd ..

Required Data

To evaluate/train CER-MVS, you will need to download the required datasets.

To download a sample set of DTU and the training set of Tanks and Temples for the demos, run

python download_demo_datasets.py

By default the code will search for the datasets in these locations. You can create symbolic links to wherever the datasets were downloaded in the datasets folder.

├── datasets
    ├── DTU
        ├── Cameras
            ├── pair.txt
            ├── *_cam.txt
        ├── Rectified
            ├── scan*
                ├── rect_*.png
        ├── Depths
            ├── scan*
                ├── depth_map_*.pfm
    ├── BlendedMVS
        ├── dataset_full_res_0-29
            ├── 5bfe5ae0fe0ea555e6a969ca/5bfe5ae0fe0ea555e6a969ca/5bfe5ae0fe0ea555e6a969ca (an example)
                ├── blended_images
                    ├── *.jpg
                ├── cams
                    ├── *_cam.txt
                    ├── pair.txt
                ├── rendered_depth_maps
                    ├── *.pfm
        ├── dataset_full_res_30-59
        ├── dataset_full_res_60-89
        ├── dataset_full_res_90-112
    ├── TanksAndTemples
        ├── tankandtemples
            ├── intermediate
                ├── Family (an example)
                    ├── cams
                        *_cam.txt
                    ├── Family.log
                    ├── images
                        ├── *.jpg
                    ├── pair.txt
            ├── advanced
        ├── training_input
            ├── Ignatius (an example)
                ├── cams
                    *_cam.txt
                ├── images
                    ├── *.jpg
                ├── pair.txt

Demos

One GPU with at least 24GB GPU memory is needed. (e.g. 3090)

Pretrained models can be downloaded at Google Drive. Then put them under a pretrained folder.

├── pretrained
    ├── train_DTU.pth
    ├── train_BlendedMVS.pth

You can demo our trained model on scan3 of DTU and Ignatius, Meetingroom of Tanks and Temples by running:

python demo.py

This will output point clouds *.ply in default results folder together with visualized depth maps *.png (modify demo.py to specify a different output folder).

├── results
    ├── scan3
        ├── depths
            ├── *.png
        ├── result.ply
    ├── Ignatius
        ├── depths
            ├── *.png
        ├── result.ply
    ├── Meetingroom
        ├── depths
            ├── *.png
        ├── result.ply

Training

Train on DTU (We trained on two 3090 GPUs (24GB GPU memory each) for 6 days):

python train.py -g train_DTU -p 'train.name = "YOUR_MODEL_NAME"'

Train on BlendedMVS (We trained on two A6000 GPUs (48GB GPU memory each) for 4 days):

python train.py -g train_BlendedMVS -p 'train.name = "YOUR_MODEL_NAME"'

Model checkpoints are saved in checkpoints folder and tensorboard logs are in runs/YOUR_MODEL_NAME

Test

One GPU with at least 24GB GPU memory is needed. (e.g. 3090)

Depth Map Inference

DTU Val/Test Set:

# low res pass
python inference.py -g inference_DTU -p 'inference.scan = "YOUR_SCAN, e.g., scan3"' \
    'inference.num_frame = 10' \
    'inference.rescale = 1'
# high res pass
python inference.py -g inference_DTU -p 'inference.scan = "YOUR_SCAN, e.g., scan3"' \
    'inference.num_frame = 10' \
    'inference.rescale = 2'

Tanks and Temples:

# low res pass
python inference.py -g inference_TNT -p 'inference.scan = "YOUR_SCAN, e.g., Ignatius"' \
    'inference.num_frame = 15' \
    'inference.rescale = 1'
# high res pass
python inference.py -g inference_TNT -p 'inference.scan = "YOUR_SCAN, e.g., Ignatius"' \
    'inference.num_frame = 15' \
    'inference.rescale = 2'

Modify config files or gin parameter to change output location and loaded weights.

For submitting parallel GPU jobs there is a script: scripts/submit_depthmap.py. Modify submitter.gin and the datasets and splits in the script for your need, and run python scripts/submit_depthmap.py.

Multi Resolution Fusion

DTU Val/Test Set:

python multires.py -g inference_DTU -p 'multires.scan = "YOUR_SCAN, e.g., scan3"'

Tanks and Temples:

python multires.py -g inference_TNT -p 'multires.scan = "YOUR_SCAN, e.g., Ignatius"'

Point Cloud Fusion

DTU Val/Test Set:

python fusion.py -g inference_DTU -p 'fusion.scan = "YOUR_SCAN, e.g., scan3"'

Tanks and Temples:

python fusion.py -g inference_TNT -p 'fusion.scan = "YOUR_SCAN, e.g., Ignatius"'

Similarly, there is a script submitting the two fusion steps: scripts/submit_fusion.py.

Evaluation

Results on DTU test set

Acc.Comp.Overall.
0.3590.3050.332

Download the Points data in official DTU website. Follow the instructions of Matlab code in SampleSet data. Note in BaseEvalMain_web.m:

if ~exist([dataPath '/ObsMask/Plane' num2str(cSet) '.mat'],'file')
    P = [0; 0; 0; 1]
else
    load([dataPath '/ObsMask/Plane' num2str(cSet)],'P')
end

Run

matlab -nodisplay -nosplash -nodesktop -r "run('BaseEvalMain_web.m');exit;"

After you get results for all scans, to get the summary (change UsedSets and method_string in ComputeStat_web too):

matlab -nodisplay -nosplash -nodesktop -r "run('ComputeStat_web.m'); exit;"

Results on Tanks and Temples

MeanFamilyFrancisHorseLighthouseM60PantherPlaygroundTrain
64.8281.1664.2150.4370.7363.8563.9965.9058.25
MeanAuditoriumBallroomCourtroomMuseumPalaceTemple
40.1925.9545.7539.6551.7535.0842.97

Download official trainingdata. And clone the github repository. And convert camera poses to .log file. (For intermediate and advanced set they are already there in the preprocessed dataset, for training set, you can convert them yourself or download from here)

Run

python REPOSITORY_LOCATION/python_toolbox/evaluation/run.py --dataset-dir LOCATION_OF_trainingdata/SCAN(e.g. Ignatius) --traj-path LOCATION_OF_LOG_FILE --ply-path YOUR_POINT_CLOUD.ply --out-dir LOCATION_TO_SAVE_RESULTS