Home

Awesome

SPARF: Neural Radiance Fields from Sparse and Noisy Poses

This is not an officially supported Google product.

This repository contains the code for the paper: SPARF: Neural Radiance Fields from Sparse and Noisy Poses. In CVPR, 2023 (Highlight).

Authors: Prune Truong, Marie-Julie Rakotosaona, Fabian Manhardt, and Federico Tombari

[arXiv preprint] [Website] [Youtube teaser]

Our approach SPARF produces realistic novel-view rendering given as few as 2 or 3 input images, with noisy camera poses. We add two novel constraints into the pose-NeRF optimization: the multi-view correspondence loss and the depth-consistency loss.

Please contact Prune Truong (prune.truong@vision.ee.ethz.ch) if you have any questions!

We provide PyTorch code for all experiments: BARF/SPARF for joint pose-NeRF training, NeRF/SPARF when considering fixed ground-truth poses as input.


Citation

If you find our code useful for your research, please cite

@inproceedings{sparf2023,
  title={SPARF: Neural Radiance Fields from Sparse and Noisy Poses},
  author = {Truong, Prune and Rakotosaona, Marie-Julie and Manhardt, Fabian and Tombari, Federico},
  publisher = {{IEEE/CVF} Conference on Computer Vision and Pattern Recognition, {CVPR}},
  year = {2023}
}

Installation

This code is developed with Python3 (python3) with Cuda 11.3. All models were trained on single A100 (or V100) GPU with 20GB of memory. This code does NOT support multi-GPU training. Please use the following command for installation.

conda create -n sparf-env python=3
conda activate sparf-env
pip install -r requirements.txt

For the specific versions of the packages used to develop this model, run the following instead.

pip install -r requirements_w_version.txt
pip install cupy-cuda113 --no-cache-dir 
git submodule update --init --recursive
git submodule update --recursive --remote
bash third_party/remove_unused_files.sh
python -c "from source.admin.environment import create_default_local_file; create_default_local_file()"
<details> <summary><b>Make sure the correspondence network is installed and running correctly</b></summary>

Run the following command

python third_party/test_pdcnet_installation.py

You should obtain the following image as third_party/test_pdcnet.png

</details>

Datasets

<details> <summary><b>DTU</b></summary> </details> <details> <summary><b>LLFF</b></summary>

The LLFF real-world data can be found in the NeRF Google Drive. For convenience, you can download them with the following script: (under this repo)

gdown --id 16VnMcF1KJYxN9QId6TClMsZRahHNMW5g # download nerf_llff_data.zip
unzip nerf_llff_data.zip
rm -f nerf_llff_data.zip
mv nerf_llff_data data/llff
</details> <details> <summary><b>Replica</b></summary>

You can download the replica dataset with the following script:

# you can also download the Replica.zip manually through
# link: https://caiyun.139.com/m/i?1A5Ch5C3abNiL password: v3fY (the zip is split into smaller zips because of the size limitation of caiyun)
wget https://cvg-data.inf.ethz.ch/nice-slam/data/Replica.zip
unzip Replica.zip
</details>

I. Running the code

Training: Quick start

The installation should have generated a local configuration file "admin/local.py". In case the file was not generated, run python -c "from source.admin.environment import create_default_local_file; create_default_local_file()"to generate it.

If all the dependencies have been correctly installed, you can train/evaluate a network using the run_trainval.py script in the correct conda environment. All checkpointing, logging and saving metrics should be done automatically.

conda activate sparf-env
# Selecting <train_module> <train_name> <nbr_input_views> <scene_name>
python run_trainval.py <train_module> <train_name> --train_sub <nbr_input_views> --scene <scene_name>

Here, train_module is the sub-module inside train_settings and train_name is the name of the train setting file to be used.

The snapshots along with a video of the RGB and depth renderings will be saved in the directory workspace_dir/<train_module>/subset_<nbr_input_views>/<scene_name>/<train_name>.

The corresponding tensorboard file will be stored in tensorboard_dir/<train_module>/subset_<nbr_input_views>/<scene_name>/<train_name>.

where workspace_dir and tensorboard_dir were set in the source.admin.local.py file.

Running this command directly computes results on the test set at the end of training, which will be stored in eval_dir/dataset_name/<train_module>/subset_<nbr_input_views>/<scene_name>/<train_name>. If you wish to recompute them, or re-generate the video rendering, run:

# Selecting <train_module> <train_name> <nbr_input_views> <scene_name>
# to regenerate the test metrics 
python run_trainval.py <train_module> <train_name> --train_sub <nbr_input_views> --scene <scene_name> --test_metrics_only True 

# to regenerate the video of the renderings
python run_trainval.py <train_module> <train_name> --train_sub <nbr_input_views> --scene <scene_name> --render_video_only True 

<br />

Training: Example

The configs are found in train_settings

For example, you can train using the included default SPARF settings for joint pose-nerf training on DTU, starting from initial noisy poses (3 views) by running:

python run_trainval.py joint_pose_nerf_training/dtu sparf --train_sub 3 --scene scan82

All the snapshots will be stored in the directory workspace_dir/joint_pose_nerf_training/dtu/subset_3/scan82/sparf. All the tensorboard will be stored in tensorboard_dir/joint_pose_nerf_training/dtu/subset_3/scan82/sparf.

Running this command directly computes results on the test set at the end of training, which will be stored in eval_dir/dtu/joint_pose_nerf_training/dtu/subset_3/scan82/sparf.

<br />

Visualizing the results

We have included code to visualize the training over TensorBoard. The TensorBoard events include the following:

At the end of the training, a video of the RGB and depth renderings is created, and saved with the snapshots. The poses used for the rendering are jitters of the optimized (or ground-truth) input poses.

<br />

Evaluation of pre-trained models

The computation of the test metrics is included at the end of the training. But it must correspond to an existing train_settings file. To evaluate any given pre-trained models (associated with its options.yaml file), run:

# Selecting <ckpt_dir> <out_dir> <expname>
python eval.py --ckpt_dir <ckpt_dir> --out_dir <out_dir> --expname <expname> --plot True

This will save the metrics file at <out_dir>/<expname>.json. if --plot is True, figures of the renderings are saved in <out_dir>/<expname>/. <ckpt_dir> is the path to a directory containing a checkpoint (the latest will automatically be loaded) and an option file named options.yaml.

We provide pre-trained models for some of the experiments and some of the scenes here.

To make sure the code is running fine, for joint_pose_nerf_training/dtu/scan82, you should get similar metrics (with test-time optimization):

Rot. errorTrans. errorPSNR (masked)SSIM (masked)LPIPS (masked)Depth err.
SPARF (Ours)0.700.009717.58 (17.36)0.82 (0.91)0.19 (0.07)0.15
<br />

Reproducing results from the publication

<details> <summary><b>Joint pose-NeRF optimization on DTU from noisy poses (Tab. 4)</b></summary>

In Tab. 4 of the main paper, we present results of the joint pose-NeRF optimization on DTU, starting from 3 images with noisy poses. To reproduce those results, run these commands for each of the 15 scenes. The results presented are the average over all scenes.

# <SCENE> is specific to datasets
# DTU (<SCENE>={'scan8', 'scan21', 'scan30', 'scan31','scan34', 'scan38','scan40','scan41','scan45','scan55','scan63','scan82','scan103','scan110','scan114'})

# SPARF
python run_trainval.py joint_pose_nerf_training/dtu sparf --scene=<SCENE> --train_sub 3

# SPARF without depth-consistency loss
# Note that the depth-consistency loss adds training time and only leads to 
# a minor improvement. We therefore also include a version without the depth
# consistency loss 
python run_trainval.py joint_pose_nerf_training/dtu sparf_wo_depth_cons_loss --scene=<SCENE> --train_sub 3

# Baseline BARF
python run_trainval.py joint_pose_nerf_training/dtu barf --scene=<SCENE> --train_sub 3

To make sure the code is running fine, for scene='scan82', you should get similar metrics (with test-time optimization):

Rot. errorTrans. errorPSNR (masked)SSIM (masked)LPIPS (masked)Depth err.
BARF13.280.417.79 (6.11)0.45 (0.77)0.67 (0.28)1.53
SPARF (Ours)0.700.009717.58 (17.36)0.82 (0.91)0.19 (0.07)0.15

Scene 'scan30' is an example of failure cases, because no correspondences can be reliably extracted.

<br /><br />

We provide pre-trained models of our SPARF with 3 input views for all 15 scenes here. To evaluate them, run

# Selecting <ckpt_dir> <out_dir> <expname>
python eval.py --ckpt_dir <ckpt_dir> --out_dir <out_dir> --expname <expname> --plot True
</details> <details> <summary><b>Joint pose-NeRF optimization on LLFF from identity poses (Tab. 5)</b></summary>

In Tab. 5 of the main paper, we present results of the joint pose-NeRF optimization on LLFF, given 3 images with identity initial poses. To reproduce those results, run these commands for each of the 8 scenes. The results presented are the average over all scenes.

# <SCENE> is specific to datasets
# LLFF (<SCENE>={'orchids', 'horns', 'trex', 'fern', 'flower', 'leaves', 'room', 'fortress'})

# SPARF
python run_trainval.py joint_pose_nerf_training/llff sparf --scene=<SCENE> --train_sub 3

# SPARF without depth-consistency loss
python run_trainval.py joint_pose_nerf_training/llff sparf_wo_depth_cons_loss --scene=<SCENE> --train_sub 3

# Baseline BARF
python run_trainval.py joint_pose_nerf_training/llff barf --scene=<SCENE> --train_sub 3

To make sure the code is running fine, for scene='horns', you should get similar metrics (with test-time optimization):

Rot. errorTrans. errorPSNRSSIMLPIPS
BARF5.530.32614.340.340.54
SPARF (Ours)0.0270.00218.940.610.33
</details> <details> <summary><b>Joint pose-NeRF optimization on Replica from COLMAP poses (Tab. 6)</b></summary>

In Tab. 6 of the main paper, we present results of the joint pose-NeRF optimization on Replica, starting from 3 images with initial poses obtained by COLMAP with PDC-Net matches. To reproduce those results, run these commands for each of the 7 scenes. The results presented are the average over all scenes.

# <SCENE> is specific to datasets
# Replica (<SCENE>={'room0', 'room1', 'room2',  'office0', 'office1', 'office2', 'office3'})

# SPARF
python run_trainval.py joint_pose_nerf_training/replica sparf --scene=<SCENE> --train_sub 3

# SPARF without depth-consistency loss
python run_trainval.py joint_pose_nerf_training/replica sparf_wo_depth_cons --scene=<SCENE> --train_sub 3

# Baseline BARF
python run_trainval.py joint_pose_nerf_training/replica barf --scene=<SCENE> --train_sub 3

To make sure the code is running fine, for scene='office0', you should get similar metrics (with test-time optimization):

Rot. errorTrans. errorPSNRSSIMLPIPSDepth err.
BARF5.370.2822.330.720.300.59
SPARF (Ours)0.580.01028.380.900.130.36
</details> <details> <summary><b> NeRF optimization on DTU with fixed ground-truth poses (Tab. 7)</b></summary>

In Tab. 7 of the main paper, we present results of NeRF-based approaches, trained with fixed ground-truth camera poses. To reproduce those results, run these commands for each of the 15 scenes. The results presented are the average over all scenes.

# <SCENE> is specific to datasets
# DTU (<SCENE>={'scan8', 'scan21', 'scan30', 'scan31','scan34', 'scan38','scan40','scan41','scan45','scan55','scan63','scan82','scan103','scan110','scan114'})

# SPARF
python run_trainval.py nerf_training_w_gt_poses/dtu sparf --scene=<SCENE> --train_sub 3


# Baseline NeRF
python run_trainval.py nerf_training_w_gt_poses/dtu nerf --scene=<SCENE> --train_sub 3

To make sure the code is running fine, for scene='scan82', you should get similar metrics:

PSNR (masked)SSIM (masked)LPIPS (masked)Depth err.
NeRF4.57 (5.36)0.28 (0.77)0.28 (0.32)1.35
SPARF (Ours)18.42 (21.71)0.87 (0.95)0.16 (0.04)0.24
</details> <details> <summary><b> NeRF optimization on LLFF with fixed ground-truth poses (Tab. 7)</b></summary>

In Tab. 7 of the main paper, we present results of NeRF-based approaches, trained with fixed ground-truth camera poses. To reproduce those results, run these commands for each of the 8 scenes. The results presented are the average over all scenes. Almost the same results are obtained using hierarchical sampling, or with only a coarse sampling.

# <SCENE> is specific to datasets
# LLFF (<SCENE>={'orchids', 'horns', 'trex', 'fern', 'flower', 'leaves', 'room', 'fortress'})

# SPARF
python run_trainval.py nerf_training_w_gt_poses/llff sparf --scene=<SCENE> --train_sub 3


# Baseline NeRF
python run_trainval.py nerf_training_w_gt_poses/llff nerf --scene=<SCENE> --train_sub 3

To make sure the code is running fine, for scene='horns', you should get similar metrics:

PSNR (masked)SSIM (masked)LPIPS (masked)
NeRF13.210.260.61
SPARF (Ours)19.390.640.29
</details>

II. Using a pre-trained model

You can find our pre-trained models (from running this repo), following the same structures as our train_settings files, here. After downloading the checkpoints, you can evaluate the model and render test images as indicated above.


III. Downloading model predictions

If you don't want to run the code, you can also directly download the renderings of our models for the test set poses using the following link: https://drive.google.com/drive/folders/1lHryExsutZsbcKJlzO7QKM34YlUwKSSx?usp=share_link. The structure follows that of the train_settings files. They correspond to the renderings used in the paper, so they will be a bit different compared to the ones obtained when running the provided pre-trained models (for which the training was done with this codebase).


IV. Codebase structure

For details on the dataloader and how to use your own, refer to this doc.

The framework in source/ consists of the following sub-modules.

Some tips on using and understanding the codebase:


V. License

This code is licensed under the Apache 2.0 License. See LICENSE for more details.