Awesome

<div align="center"> <h1>[ICCV2023] 🧊 FrozenRecon: Pose-free 3D Scene Reconstruction with Frozen Depth Models </h1>

Guangkai Xu<sup>1*</sup>, Wei Yin<sup>2*</sup>, Hao Chen<sup>3</sup>, Chunhua Shen<sup>3</sup>, Kai Cheng<sup>1</sup>, Feng Zhao<sup>1</sup>

<sup>1</sup>University of Science and Technology of China <sup>2</sup>DJI Technology <sup>3</sup>Zhejiang University, China

Project Page, arXiv, Paper, Supplementary

Reconstruct your pose-free video with 🧊 FrozenRecon in ~20 minutes

</div> <div align="center"> <img width="800" alt="image" src="figs/frozenrecon-demo.png"> </div> We propose a novel test-time optimization approach that can transfer the robustness of affine-invariant depth models such as LeReS to challenging diverse scenes while ensuring inter-frame consistency, with only dozens of parameters to optimize per video frame. Specifically, our approach involves freezing the pre-trained affine-invariant depth model's depth predictions, rectifying them by optimizing the unknown scale-shift values with a geometric consistency alignment module, and employing the resulting scale-consistent depth maps to robustly obtain camera poses and camera intrinsic simultaneously. Dense scene reconstruction demo is shown as below.

Prerequisite

Pre-trained Checkpoints

In this project, we use LeReS to predict affine-invariant depth maps. Please download the pre-trained checkpoint of LeReS, and place it in FrozenRecon/LeReS/res101.pth. If optimize outdoor scenes, the checkpoint of Segformer should also be downloaded and placed in FrozenRecon/SegFormer/segformer.b3.512x512.ade.160k.pth

Demo Data

We provide one demo data for each scene, and another in-the-wild video captured from iPhone14 Pro without any lidar sensor information. Download from BaiduNetDisk, and place it in FrozenRecon/demo_data.

Installation

git clone --recursive https://github.com/aim-uofa/FrozenRecon.git
cd FrozenRecon
conda create -y -n frozenrecon python=3.8
conda activate frozenrecon
pip install torch==1.7.1+cu110 torchvision==0.8.2+cu110 -f https://download.pytorch.org/whl/torch_stable.html # pytorch 1.7.1 for SegFormer
pip install -r requirements.txt

# (Optional) For outdoor scenes, we recommand to mask the sky regions and cars (potential dynamic objects)
pip install timm==0.3.2
pip install --upgrade mmcv-full==1.2.7 -f https://download.openmmlab.com/mmcv/dist/cu110/torch171/index.html
# pip install "mmsegmentation==0.11.0"
pip install ipython attr 
git clone https://github.com/NVlabs/SegFormer.git
cd SegFormer && pip install -e . & cd ..
# After installing SegFormer, please downlaod segformer.b3.512x512.ade.160k.pth checkpoint following https://github.com/NVlabs/SegFormer, and place it in SegFormer/

# (Optional) Install lietorch. It can make optimization faster.
git clone --recursive https://github.com/princeton-vl/lietorch.git
cd lietorch && python setup.py install & cd ..

Optimization

1. In-the-wild Video Input

# Take demo data as an example
python src/optimize.py --video_path demo_data/IMG_8765.MOV

# # For self-captured videos
# python src/optimize.py --video_path PATH_TO_VIDEO --scene_name SCENE_NAME

2. In-the-wild Extracted Images Input

python src/optimize.py --img_root PATH_TO_IMG_FOLDER --scene_name SCENE_NAME

3. Datasets (Optional with GT Priors)

# Export ground-truth data root here.
export GT_ROOT='./demo_data' # PATH_TO_GT_DATA_ROOT, you can download demo_data following "Data" subsection.

# FrozenRecon with datasets, take NYUDepthVideo classroom_0004 as example.
python src/optimize.py --dataset_name NYUDepthVideo --gt_root $GT_ROOT --scene_name classroom_0004 

# FrozenRecon with GT priors.
python src/optimize.py --dataset_name NYUDepthVideo --gt_root $GT_ROOT --gt_intrinsic_flag --save_suffix gt_intrinsic --scene_name classroom_0004
python src/optimize.py --dataset_name NYUDepthVideo --gt_root $GT_ROOT --gt_pose_flag --save_suffix gt_pose --scene_name classroom_0004
python src/optimize.py --dataset_name NYUDepthVideo --gt_root $GT_ROOT --gt_depth_flag --save_suffix gt_depth --scene_name classroom_0004
python src/optimize.py --dataset_name NYUDepthVideo --gt_root $GT_ROOT --gt_intrinsic_flag --gt_pose_flag --save_suffix gt_intrinsic_gt_pose --scene_name classroom_0004
python src/optimize.py --dataset_name NYUDepthVideo --gt_root $GT_ROOT --gt_pose_flag --gt_depth_flag --save_suffix gt_depth_gt_pose --scene_name classroom_0004
python src/optimize.py --dataset_name NYUDepthVideo --gt_root $GT_ROOT --gt_intrinsic_flag --gt_depth_flag --save_suffix gt_intrinsic_gt_depth --scene_name classroom_0004
python src/optimize.py --dataset_name NYUDepthVideo --gt_root $GT_ROOT --gt_intrinsic_flag --gt_pose_flag --gt_depth_flag --save_suffix gt_intrinsic_gt_depth_gt_pose --scene_name classroom_0004

4. Outdoor Scenes

export GT_ROOT='./demo_data' # PATH_TO_GT_DATA_ROOT, you can download demo_data following "Data" subsection.
# We suggest to use GT intrinsic for stable optimization.
python src/optimize.py --dataset_name NYUDepthVideo --scene_name SCENE_NAME --gt_root $GT_ROOT --gt_intrinsic_flag --scene_name 2011_09_26_drive_0001_sync --outdoor_scenes

🎫 License

For non-commercial academic use, this project is licensed under the 2-clause BSD License. For commercial use, please contact Chunhua Shen.

🖊️ Citation

If you find this project useful in your research, please cite:

@inproceedings{xu2023frozenrecon,
  title={FrozenRecon: Pose-free 3D Scene Reconstruction with Frozen Depth Models},
  author={Xu, Guangkai and Yin, Wei and Chen, Hao and Shen, Chunhua and Cheng, Kai and Zhao, Feng},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={9310--9320},
  year={2023}
}