Awesome

VisFusion: Visibility-aware Online 3D Scene Reconstruction from Videos (CVPR 2023)

Project Page | Paper | Supplementary | ScanNet Test Results

Installation

sudo apt install libsparsehash-dev
conda env create -f environment.yaml
conda activate visfusion

ScanNet Dataset

We use the same input data structure as NeuralRecon. You could download and extract ScanNet v2 dataset by following the instructions provided at http://www.scan-net.org/ or the scannet_wrangling_scripts provided by SimpleRecon.

Expected directory structure of ScanNet:

DATAROOT
└───scannet
│   └───scans
│   |   └───scene0000_00
│   |       └───color
│   |       │   │   0.jpg
│   |       │   │   1.jpg
│   |       │   │   ...
│   |       │   ...
│   └───scans_test
│   |   └───scene0707_00
│   |       └───color
│   |       │   │   0.jpg
│   |       │   │   1.jpg
│   |       │   │   ...
│   |       │   ...
|   └───scannetv2_test.txt
|   └───scannetv2_train.txt
|   └───scannetv2_val.txt

Then generate the input fragments and the ground truth TSDFs for the training/val data split by

python tools/tsdf_fusion/generate_gt.py --data_path PATH_TO_SCANNET \ 
                                        --save_name all_tsdf_9 \ 
                                        --window_size 9

and for the test split by

python tools/tsdf_fusion/generate_gt.py --test \ 
                                        --data_path PATH_TO_SCANNET \ 
                                        --save_name all_tsdf_9 \ 
                                        --window_size 9

Example data

We provide an example ScanNet scene (scene0785_00) to quickly try out the code. Download it from here and unzip it into the main directory of the project code.

The reconstructed meshes will be saved to PROJECT_PATH/results.

python main.py --cfg ./config/test.yaml \
                SCENE scene0785_00 \ 
                TEST.PATH ./example_data/ScanNet \ 
                LOGDIR: ./checkpoints \ 
                LOADCKPT pretrained/model_000049.ckpt

By default, it will output double layer meshes (for NeuralRecon's evaluation). Set MODEL.SINGLE_LAYER_MESH=True to directly output single layer meshes for TransformerFusion's evaluation.

python main.py --cfg ./config/test.yaml \
                SCENE scene0785_00 \ 
                TEST.PATH ./example_data/ScanNet \ 
                LOGDIR: ./checkpoints \ 
                LOADCKPT pretrained/model_000049.ckpt \ 
                MODEL.SINGLE_LAYER_MESH True

Training

Change TRAIN.PATH to your own data path in config/train.yaml and start training by running ./train.sh.

train.sh:

#!/usr/bin/env bash
export CUDA_VISIBLE_DEVICES=0

python main.py --cfg ./config/train.yaml TRAIN.EPOCHS 20 MODEL.FUSION.FUSION_ON False
python main.py --cfg ./config/train.yaml TRAIN.EPOCHS 41
python main.py --cfg ./config/train.yaml TRAIN.EPOCHS 44 TRAIN.FINETUNE_LAYER 0 MODEL.PASS_LAYERS 0
python main.py --cfg ./config/train.yaml TRAIN.EPOCHS 47 TRAIN.FINETUNE_LAYER 1 MODEL.PASS_LAYERS 1
python main.py --cfg ./config/train.yaml TRAIN.EPOCHS 50 TRAIN.FINETUNE_LAYER 2 MODEL.PASS_LAYERS 2

The training is seperated to five phases:

Phase 1 (epoch 1 - 20), train single fragments. MODEL.FUSION.FUSION_ON=False
Phase 2 (epoch 21 - 41), train the whole model with GRUFusion.
Phase 3 (epoch 42 - 44), finetune the first layer with GRUFusion. TRAIN.FINETUNE_LAYER=0, MODEL.PASS_LAYERS=0
Phase 4 (epoch 45 - 47), finetune the second layer with GRUFusion. TRAIN.FINETUNE_LAYER=1, MODEL.PASS_LAYERS=1
Phase 5 (epoch 48 - 50), finetune the third layer with GRUFusion. TRAIN.FINETUNE_LAYER=2, MODEL.PASS_LAYERS=2

Test

Change TEST.PATH to your own data path in config/test.yaml and start testing by running

python main.py --cfg ./config/test.yaml

Evaluation

We use NeuralRecon's evaluation for our main results.

python tools/evaluation.py --model ./results/scene_scannet_checkpoints_fusion_eval_49 --n_proc 16

You could print previous evaluation results by

python tools/visualize_metrics.py --model ./results/scene_scannet_checkpoints_fusion_eval_49

Here is the 3D metrics on ScanNet generated by the provided checkpoint using NeuralRecon's evaluation:

Acc ↓	Comp ↓	Chamfer ↓	Prec ↑	Recall ↑	F-Score↑
5.6	10.0	7.80	0.694	0.537	0.604

and using TransformerFusion's evaluation (set MODEL.SINGLE_LAYER_MESH=True to output single layer meshes):

Acc ↓	Comp ↓	Chamfer ↓	Prec ↑	Recall ↑	F-Score↑
4.10	8.66	6.38	0.757	0.588	0.660

ARKit data

To try with your own data captured from ARKit, please refer to NeuralRecon's DEMO.md for more details.

python test_scene.py --cfg ./config/test_scene.yaml \ 
                     DATASET ARKit \ 
                     TEST.PATH ./example_data/ARKit_scan \ 
                     LOADCKPT pretrained/model_000049.ckpt

Citation

If you find our work useful in your research please consider citing our paper:

@inproceedings{gao2023visfusion,
  title={VisFusion: Visibility-aware Online 3D Scene Reconstruction from Videos},
  author={Gao, Huiyu and Mao, Wei and Liu, Miaomiao},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={17317--17326},
  year={2023}
}

Acknowledgment

This repository is partly based on the repo NeuralRecon. Many thanks to Jiaming Sun for the great code!