Awesome
NeRSemble: Multi-view Radiance Field Reconstruction of Human Heads
Paper | Video | Project Page
Tobias Kirschstein, Shenhan Qian, Simon Giebenhain, Tim Walter and Matthias Nießner
Siggraph 2023
1. Installation
1.1. Dependencies
- PyTorch 2.0
- nerfstudio
- tinycudann
-
Setup environment
conda env create -f environment.yml conda activate nersemble
which creates a new conda environment
nersemble
(Installation may take a while). -
Manually install
tinycudann
:pip install git+https://github.com/NVlabs/tiny-cuda-nn/#subdirectory=bindings/torch
(Also helpful, if you get an error like
ImportError: DLL load failed while importing _86_C: The specified procedure could not be found.
later on) -
Install the
nersemble
package itself by runningpip install -e .
inside the cloned repository folder.
1.2. Environment Paths
All paths to data / models / renderings are defined by environment variables.
Please create a file in your home directory in ~/.config/nersemble/.env
with the following content:
NERSEMBLE_DATA_PATH="..."
NERSEMBLE_MODELS_PATH="..."
NERSEMBLE_RENDERS_PATH="..."
Replace the ...
with the locations where data / models / renderings should be located on your machine.
NERSEMBLE_DATA_PATH
: Location of the multi-view video dataset (See section 2 for how to obtain the dataset)NERSEMBLE_MODELS_PATH
: During training, model checkpoints and configs will be saved hereNERSEMBLE_RENDERS_PATH
: Video renderings of trained models will be stored here
If you do not like creating a config file in your home directory, you can instead hard-code the paths in the env.py.
1.3. Troubleshooting
You may run into this error at the beginning of training:
\lib\site-packages\torch\include\pybind11\cast.h(624): error: too few arguments for template template parameter "Tuple"
detected during instantiation of class "pybind11::detail::tuple_caster<Tuple, Ts...> [with Tuple=std::pair, Ts=<T1, T2>]"
(721): here
\lib\site-packages\torch\include\pybind11\cast.h(717): error: too few arguments for template template parameter "Tuple"
detected during instantiation of class "pybind11::detail::tuple_caster<Tuple, Ts...> [with Tuple=std::pair, Ts=<T1, T2>]"
(721): here
This occurs during compilation of torch_efficient_distloss
and can be solved by either training without
distortion loss or by changing one line in the torch_efficient_distloss
library (see https://github.com/sunset1995/torch_efficient_distloss/issues/8).
2. Dataset
Access to the dataset can be requested here.
To reproduce the experiments from the paper, only download the nersemble_XXX_YYY.zip
files (There are 10 in total for the 10 different sequences), as well as the camera_params.zip
.
Extract these .zip files into NERSEMBLE_DATA_PATH
.
Also, see src/nersemble/data_manager/multi_view_data.py for an explanation of the folder layout.
3. Usage
3.1. Training
python scripts/train/train_nersemble.py $ID $SEQUENCE_NAME --name $NAME
where $ID
is the id of the participant in the dataset (e.g., 030
) and SEQUENCE_NAME
is the name of the expression / emotion / sentence (e.g., EXP-2-eyes
).
$NAME
may optionally be used to annotate the checkpoint folder and the wandb experiment with some descriptive experiment name.
The training script will place model checkpoints and configuration in ${NERSEMBLE_MODELS_PATH}/nersemble/NERS-XXX-${name}/
. The incremental run id XXX
will be automatically determined.
GPU Requirements
Training takes roughly 1 day and requires at least an RTX A6000 GPU (48GB VRAM). GPU memory requirements may be lowered by tweaking some of these hyperparameters:
--max_n_samples_per_batch
: restricts How many ray samples are fed through the model at once (default 20 for 2^20 samples)--n_hash_encodings
: Number of hash encodings in the ensemble (default 32). Using 16 should give comparable quality (--latent_dim_time
needs to be set to the same value)--cone_angle
: Use larger steps between ray samples for further away points. The default value of0
(no step size increase) provides the best quality. Try values up to0.004
--n_train_rays
: Number of rays per batch (default 4096). Lower values can affect convergence--mlp_num_layers
/--mlp_layer_width
: Making the deformation field smaller should still provide reasonable performance.
RAM requirements
Per default, the training script will cache loaded images in RAM which can cause RAM usage up to 200G. RAM usage can be lowered by:
--max_cached_images
(default 10k): Set to0
to completely disable caching
Special config for sequences 97 and 124
We disable the occupancy grid acceleration structure from Instant NGP as well as the use of distortion loss due to complex hair motion in sequence 97:
python scripts/train/train_nersemble.sh 97 HAIR --name $name --disable_occupancy_grid --lambda_dist_loss 0
We only train on a subset of sequence 124 (timesteps 95-570) and slightly prolong the warmup phase due to the complexity of the sequence:
python scripts/train/train_nersemble.sh 124 FREE --name $name --start_timestep 95 --n_timesteps 475 --window_hash_encodings_begin 50000 --window_hash_encodings_end 100000
3.2. Evaluation
In the paper, all experiments are conducted by training on only 12 cameras and evaluating rendered images on 4 hold-out views (cameras 222200040
, 220700191
, 222200043
and 221501007
).
-
For obtaining the reported PSNR, SSIM and LPIPS metrics (evaluated at 15 evenly spaced timesteps):
python scripts/evaluate/evaluate_nersemble.py NERS-XXX
where
NERS-XXX
is the run name obtained from running the training script above. -
For obtaining the JOD video metric (evaluated at 24fps, takes much longer):
python scripts/evaluate/evaluate_nersemble.py NERS-XXX --skip_timesteps 3 --max_eval_timesteps -1
The evaluation results will be printed in the terminal and persisted as a .json
file in the model folder ${NERSEMBLE_MODELS_PATH}/NERS-XXX-${name}/evaluation
.
3.3. Rendering
From a trained model NERS-XXX
, a circular trajectory (4s) may be rendered via:
python scripts/render/render_nersemble.py NERS-XXX
The resulting .mp4
file is stored in NERSEMBLE_RENDERS_PATH
.
4. Trained Models
We provide one trained NeRSemble for each of the 10 sequences used in the paper:
Participant ID | Sequence | Model |
---|---|---|
18 | EMO-1-shout+laugh | NERS-9018 |
30 | EXP-2-eyes | NERS-9030 |
38 | EXP-1-head | NERS-9038 |
85 | SEN-01-port_strong_smokey | NERS-9085 |
97 | HAIR | NERS-9097 |
124 | FREE | NERS-9124 |
175 | EXP-6-tongue-1 | NERS-9175 |
226 | EXP-3-cheeks+nose | NERS-9226 |
227 | EXP-5-mouth | NERS-9227 |
240 | EXP-4-lips | NERS-9240 |
Simply put the downloaded model folders into ${NERSEMBLE_MODELS_PATH}/nersemble
.
You can then use the evaluate_nersemble.py
and render_nersemble.py
scripts to obtain renderings or reproduce the official metrics below.
5. Official metrics
Metrics averaged over all 10 sequences from the NVS benchmark (same 10 sequences as in the paper):
Model | PSNR | SSIM | LPIPS | JOD |
---|---|---|---|---|
NeRSemble | 31.48 | 0.872 | 0.217 | 7.85 |
Note the following:
- The metrics are slightly different from the paper due to the newer version of nerfstudio used in this repository
- PSNR, SSIM and LPIPS are computed on only 15 evenly spaced timesteps (to make comparisons cheaper)
- JOD is computed on every 3rd timestep (using
--skip_timesteps 3 --max_eval_timesteps -1
) - Metrics for sequence 97 were computed with
--no_use_occupancy_grid_filtering
If you find our code, dataset or paper useful, please consider citing
@article{kirschstein2023nersemble,
author = {Kirschstein, Tobias and Qian, Shenhan and Giebenhain, Simon and Walter, Tim and Nie\ss{}ner, Matthias},
title = {NeRSemble: Multi-View Radiance Field Reconstruction of Human Heads},
year = {2023},
issue_date = {August 2023},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
volume = {42},
number = {4},
issn = {0730-0301},
url = {https://doi.org/10.1145/3592455},
doi = {10.1145/3592455},
journal = {ACM Trans. Graph.},
month = {jul},
articleno = {161},
numpages = {14},
}
Contact Tobias Kirschstein for questions, comments and reporting bugs, or open a GitHub issue.