Home

Awesome

[Unofficial code-base] NeRF--: Neural Radiance Fields Without Known Camera Parameters

[ Project | Paper | Official code base ] :arrow_left: Thanks the original authors for the great work!

Example results

inputraw images of the same scene (order doesn't matter, could be in arbitrary order)
output<br>(after joint optimization)camera intrinsics (focal_x and focal_y)<br>camera extrinsics (inverse of poses, rotations and translations) of each image<br>a 3D implicit representation framework [NeRF] that models both appearance and geometry of the scene

Source 1: random YouTube video clips, time from 00:10:36 to 00:10:42

<table> <thead align="center"> <tr> <th></th> <th>ReLU-based NeRF--<br>(no refinement, stuck in local minima)</th> <th>SIREN-based NeRF--<br>(no refinement)</th> </tr> </thead> <tbody> <tr> <td>input<br>32 raw photos, sampled at 5 fps<br>32 x [540 x 960 x 3]</td> <td colspan="2" align="center"><img src="media/castle_1041.gif" alt="castle_input"></td> </tr> <tr> <td>learned scene model size</td> <td align="center" colspan="2">1.7 MiB / 158.7k params<br>8+ MLPs with width of 128</td> </tr> <tr> <td>learned camera poses</td> <td><img src="media/castle_1041_relu_pose.gif" alt="castle_1041_relu_pose"></td> <td><img src="media/castle_1041_siren_pose.gif" alt="castle_1041_pose_siren"></td> </tr> <tr> <td>predicted rgb<br>(appearance)<br>(with novel view synthesis)</td> <td><img src="media/nerfmm_castle_1041_spiral_rgb_540x960.gif" alt="castle_1041_relu"></td> <td><img src="media/nerfmm_castle_1041_siren_nvs_si40_pre_softplus_spiral_rgb_540x960.gif" alt="castle_1041_siren"></td> </tr> <tr> <td>predicted depth<br>(geometry)<br>(with novel view synthesis)</td> <td><img src="media/nerfmm_castle_1041_spiral_depth_540x960.gif" alt="castle_1041_relu_depth"></td> <td><img src="media/nerfmm_castle_1041_siren_nvs_si40_pre_softplus_spiral_depth_540x960.gif" alt="castle_1041_siren"></td> </tr> </tbody> </table>

Source 2: random YouTube video clips, time from 00:46:17 to 00:46:28

<table> <thead align="center"> <tr> <th></th> <th>ReLU-based NeRF--<br>(with refinement, still stuck in local minima)</th> <th>SIREN-based NeRF--<br>(with refinement)</th> </tr> </thead> <tbody> <tr> <td>input<br>27 raw photos, sampled at 2.5 fps<br>27 x [540 x 960 x 3]</td> <td colspan="2" align="center"><img src="media/castle_4614.gif" alt="castle_4614_input"></td> </tr> <tr> <td>learned scene model size</td> <td align="center" colspan="2">1.7 MiB / 158.7k params<br>8+ MLPs with width of 128</td> </tr> <tr> <td>learned camera poses</td> <td><img src="media/nerfmm_castle_4614_pre_refine_pose.gif" alt="castle_4614_pose_siren"></td> <td><img src="media/castle_4614_siren_pose.gif" alt="castle_4614_pose_siren"></td> </tr> <tr> <td>predicted rgb<br>(appearance)<br>(with novel view synthesis)</td> <td><img src="media/nerfmm_castle_4614_pre_refine_spiral_rgb_540x960.gif" alt="castle_1041_siren"></td> <td><img src="media/nerfmm_castle_4614_siren_nvs_si40_pre_softplus_spiral_rgb_540x960.gif" alt="castle_1041_siren"></td> </tr> <tr> <td>predicted depth<br>(geometry)<br>(with novel view synthesis)</td> <td><img src="media/nerfmm_castle_4614_pre_refine_spiral_depth_540x960.gif" alt="castle_1041_siren"></td> <td><img src="media/nerfmm_castle_4614_siren_nvs_si40_pre_softplus_spiral_depth_540x960.gif" alt="castle_1041_siren"></td> </tr> </tbody> </table>

Source 3: photos by @crazyang

<table> <thead align="center"> <tr> <th></th> <th>ReLU-based NeRF--<br>(no refinement)</th> <th>SIREN-based NeRF--<br>(no refinement)</th> </tr> </thead> <tbody> <tr> <td>input<br>22 raw photos<br>22 x [756 x 1008 x3]</td> <td colspan="2" align="center"><img src="media/piano_input.gif" alt="piano_input"></td> </tr> <tr> <td>learned scene model size</td> <td align="center" colspan="2">1.7 MiB / 158.7k params<br>8+ MLPs with width of 128</td> </tr> <tr> <td>learned camera poses</td> <td><img src="media/piano_relu_pose.gif" alt="piano_relu_pose"></td> <td><img src="media/piano_siren_pose.gif" alt="piano_siren_pose"></td> </tr> <tr> <td>predicted rgb<br>(appearance)<br>(with novel view synthesis)</td> <td><img src="media/piano_relu_rgb.gif" alt="piano_relu_rgb"></td> <td><img src="media/piano_siren_rgb.gif" alt="piano_siren_rgb"></td> </tr> <tr> <td>predicted depth<br>(geometry)<br>(with novel view synthesis)</td> <td><img src="media/piano_relu_depth.gif" alt="piano_relu_depth"></td> <td><img src="media/piano_siren_depth.gif" alt="piano_siren_depth"></td> </tr> </tbody> </table>

Notice that the reflectance of the piano's side is misunderstood as transmittance, which is reasonable and acceptable since no prior of the shape of the piano is provided.

What is NeRF and what is NeRF--

NeRF

NeRF is a neural (differentiable) rendering framework with great potentials. Please view [NeRF Project Page] for more details.

It represents scenes as a continuous function (typically modeled by several layers of MLP with non-linear activations); the same ideas within DeepSDF, SRN, DVR, and so on.

It is suggested to refer to [awesome-NeRF] and [awesome-neural-rendering] to catch up with recent 'exploding' development in these areas.

NeRF--

NeRF-- modifies the original NeRF from requiring known camera parameters to supporting unknown and learnable camera parameters.

My modifications & refinements / optional features

This repo first implements NeRF-- with nothing changed from the original paper. But it also support the following optional modifications, and will keep updating.

All the options are configured using yaml configuration files in the configs folder. See details about how to use these configs in the configuration section.

SIREN-based NeRF as backbone

Replace the ReLU activations of NeRF with sinusoidal(sin) activation. Codes borrowed and modified from [lucidrains' implementation of pi-GAN]. Please refer to SIREN and pi-GAN for more theoretical details.

To config:

model:
  framework: SirenNeRF # options: [NeRF, SirenNeRF]

:pushpin: SIREN-based NeRF compared with ReLU-based NeRF

ReLU-based NeRF-- (no refinement)SIREN-based NeRF-- (no refinement)
image-20210418015802348depth_siren

siren_vs_relu_loss

The above two conclusions are also evidenced by the DeepSDF results shown in the SIREN project.

e.g. LLFF-flower scene

ReLU-based NeRF-- (with refinement)SIREN-based NeRF-- (with refinement)
relu_rgbrgb_siren
relu_depthdepth_siren

Note: since the raw output of SirenNeRF is relatively slower to grow, I multiply the raw output (sigma) of SirenNeRF with a factor of 10 30. To config, use model:siren_sigma_mul

[WIP] Perceptual model

For fewer shots with large viewport changes, I add an option to use a perceptual model (CLIP) and an additional perceptual loss along with the reconstruction loss, as in DietNeRF.

To config:

data:
  N_rays: -1 # options: -1 for whole image and no sampling, a integer > 0 for number of ray samples
training:
  w_perceptual: 0.01 # options: 0. for no perceptual model & loss, >0 to enable

Note: as the CLIP model requires at least 224x224 resolution and a whole image (not sampled rays) as input

More choices of rotation representations / intrinsics parameterization

Refer to this paper for theoretical suggestions for different choices of SO(3) representations.

To config:

model:
  so3_representation: 'axis-angle' # options: [quaternion, axis-angle, rotation6D]

To config:

model:
  intrinsics_representation: 'square' # options: [square, ratio, exp]

Usage

hardware

software

configuration

There are three choices for giving configuration values:

The configuration overwriting priority order:

data

dataset sourcelink / scriptfile path
LLFFDownload LLFF example data using the scripts (run in project root directory):<br>bash dataio/download_example_data.sh(automatic)
Youtube video clipshttps://www.youtube.com/watch?v=hWagaTjEa3Y./data/castle_1041<br>./data/castle_4614
piano photos by @crazyangGoogle-drive./data/piano

pre-trained models

You can get pre-trained models in either of the following two ways:

Training

Before running any python scripts for the first time, cd to the project root directory and add the root project directory to the PYTHONPATH by running:

cd /path/to/improved-nerfmm
source set_env.sh

Train on example data (without refinement)

Download LLFF example data using the scripts (run in project root directory)

bash dataio/download_example_data.sh

Start training:

python train.py --config configs/fern.yaml

:rocket: Train on your own data

Train on video clips

Automatic training with a pre-train stage and refine stage

Run

python train.py --config ./configs/fern_prefine.yaml

Or

python train.py --config ./configs/fern.yaml --training:num_epoch_pre 1000 --expname fern_prefine

You can also try on your own photos using similar configurations.

Refining a pre-trained NeRF--

This is the step suggested by original NeRF-- paper: drop all pre-trained parameters except for camera parameters, and refine.

For example, refine a pre-trained LLFF-fern scene, with original config stored in ./configs/fern.yaml, a pre-trained checkpoint in ./logs/fern/ckpts/final_xxxx.pt, and with a new experiment name fern_refine:

python train.py --config ./configs/fern.yaml --expname fern_refine --training:ckpt_file ./logs/fern/ckpts/final_xxxx.pt  --training:ckpt_only_use_keys cam_params

Note:

Testing

Free view port rendering

python tools/free_viewport_rendering.py --load_dir /path/to/pretrained/exp_dir --render_type interpolate
python tools/free_viewport_rendering.py --load_dir /path/to/pretrained/exp_dir --render_type spiral

Visualize learned camera pose

python tools/plot_camera_pose.py --load_dir /path/to/pretrained/exp_dir

Notice that the learned camera phi & t is actually for camera2world matrices, the inverse of camera extrinsics

You will get a matplotlib window like this:

image-20210425181218826

Road-map & updates

Basic NeRF model

Efficiency & training

More experiments

Better SfM strategy

More applicable for more scenes

Related/used code bases

Citations

@article{wang2021nerf,
  title={Ne{RF}$--$: Neural Radiance Fields Without Known Camera Parameters},
  author={Wang, Zirui and Wu, Shangzhe and Xie, Weidi and Chen, Min and Prisacariu, Victor Adrian},
  journal={arXiv preprint arXiv:2102.07064},
  year={2021}
}
@inproceedings{sitzmann2020siren,
  author={Sitzmann, Vincent and Martel, Julien NP and Bergman, Alexander W and Lindell, David B and Wetzstein, Gordon},
  title={Implicit neural representations with periodic activation functions},
  booktitle={Proc. NeurIPS},
  year={2020}
}
@article{jain2021dietnerf,
  title={Putting NeRF on a Diet: Semantically Consistent Few-Shot View Synthesis},
  author={Ajay Jain and Matthew Tancik and Pieter Abbeel},
  journal={arXiv},
  year={2021}
}