Awesome
<div align="center">SUP-NeRF: A Streamlined Unification of Pose Estimation and NeRF for Monocular 3D Object Reconstruction
Project Page | Paper | Talk
<p style="font-size:1.2em"> <a href="https://yuliangguo.github.io"><strong>Yuliang Guo</strong></a><sup>1</sup> · <a href="https://sites.google.com/view/abhinavkumar"><strong>Abhinav Kumar</strong></a><sup>1,2</sup> · <a href="https://scholar.google.com/citations?user=EAC-8m0AAAAJ&hl=en"><strong>Cheng Zhao</strong></a><sup>1</sup> · <a href="https://scholar.google.com/citations?user=UfGLSTkAAAAJ&hl=en"><strong>Ruoyu Wang</strong></a><sup>1</sup> · <a href="https://scholar.google.com/citations?user=cL4bNBwAAAAJ&hl=en"><strong>Xinyu Huang</strong></a><sup>1</sup> · <a href="https://www.liu-ren.com"><strong>Liu Ren</strong></a><sup>1</sup> <br> <sup>1</sup>Bosch Research North America, Bosch Center for AI, <sup>2</sup>Michigan State University </p>in ECCV 2024
<p align="center"> <img src="figs/pipeline_overview.png" width="800"> </p> </div><p align="center"> <img src="figs/supnerf_demo.gif" width="400"> </p>Monocular 3D reconstruction for categorical objects heavily relies on accurately perceiving each object's pose. While gradient-based optimization in a NeRF framework updates the initial pose, this paper highlights that scale-depth ambiguity in monocular object reconstruction causes failures when the initial pose deviates moderately from the true pose. Consequently, existing methods often depend on a third-party 3D object to provide an initial object pose, leading to increased complexity and generalization issues. To address these challenges, we present SUP-NeRF, a streamlined Unification of object Pose estimation and NeRF-based object reconstruction. SUP-NeRF decouples the object's dimension estimation and pose refinement to resolve the scale-depth ambiguity, and introduces a camera-invariant projected-box representation that generalizes cross different domains. While using a dedicated pose estimator that smoothly integrates into an object-centric NeRF, SUP-NeRF is free from external 3D detectors. SUP-NeRF achieves state-of-the-art results in both reconstruction and pose estimation tasks on the nuScenes dataset. Furthermore, SUP-NeRF exhibits exceptional cross-dataset generalization on the KITTI and Waymo datasets, surpassing prior methods with up to 50% reduction in rotation and translation error.
Citation
If you find our work useful in your research, please consider starring the repo and citing:
@inproceedings{guo2024supnerf,
title={{SUP-NeRF: A Streamlined Unification of Pose Estimation and NeRF for Monocular $3$D Object Reconstruction}},
author={Yuliang Guo, Abhinav Kumar, Cheng Zhao, Ruoyu Wang, Xinyu Huang, Liu Ren},
booktitle={ECCV},
year={2024}
}
Catalog
- Official pytorch implementation of SUP-NeRF, ECCV 2024 (Ours)
- Unofficial pytorch implementation of AutoRF: Learning 3D Object Radiance Fields from Single View Observations, CVPR 2022
- Training pipeline for SUP-NeRF and AutoRF on nuScenes dataset
- Testing and evaluation pipeline for SUP-NeRF and AutoRF on nuScenes, KITTI and Waymo datasets
- Data preparation and curation scripts
- Testing and evaluation pipeline for Shape, Pose, and Appearance from a Single Image via Bootstrapped Radiance Field Inversion (BootInv, A.K.A. nerf-from-image), CVPR 2023 on nuScenes, KITTI and Waymo datasets
Installation
conda create -y -n sup-nerf python=3.8
conda activate sup-nerf
conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.3 -c pytorch
pip install -r requirements.txt
Data Preparation
<a href="https://huggingface.co/datasets/yuliangguo/SUP-NeRF-ECCV2024"> <img alt="Spaces" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Datasets-blue"> </a>nuScenes
We use nuScenes dataset for both training and testing. Download NuScenes dataset to you data directory and soft link the related directories to the project data/NuScenes
directory. The required data structure is as follows:
SUPNERF
├── data
│ ├── NuScenes
│ │ ├── samples
│ │ ├── maps
│ │ ├── v1.0-mini
│ │ ├── v1.0-trainval
│ │ ├── pred_instance
│ │ └── pred_det3d
│ │ ...
│ ...
samples
, maps
, v1.0-mini
, v1.0-trainval
are directly downloaded from the nuScenes dataset.
pred_instance
includes the required instance masks prepared via a customized script in our folk of mask-rcnn detectron2. Our prepared directory can be directly downloaded from [dropbox] [hugging face].
pred_det3d
includes 3D object detection results via a customized script in our folk of FCOS3D. It is only required by previous method AutoRF. If you only consider trying our method, you may not need it. Our prepared directory can be directly downloaded from [dropbox] [hugging face].
SUP-NeRF follows a object-centric setup, where only a subset of annotated objects are curated for experiments. Please follow our paper for the data curation details. The curated subsets and splits are recorded in .json
files in data/NuScenes
. To modify the curation, check src/data_nuscenes.py
to re-run the preprocess step.
KITTI
We use KITTI dataset in cross-domain generalization test. We follow DEVIANT to setup the basic KITTI directory. And we prepare additional directories for our experiments. The required data structure is as follows:
SUPNERF
├── data
│ ├── KITTI
│ │ ├── ImageSets
│ │ ├── kitti_split1
│ │ └── training
│ │ ├── calib
│ │ ├── image_2
│ │ ├── label_2
│ │ ├── velodyne
│ │ ├── pred_instance
│ │ └── pred
│ │ ...
│ ...
Because only the training split of KITTI dataset includes ground-truth object annotations, we conduct cross-domain evaluation on the training split of KITTI dataset.
calib
, image_2
, label_2
, velodyne
are directly downloaded from the KITTI website.
Similar to nuScenes, pred_instance
includes the required instance masks prepared via a customized script in our folk of mask-rcnn detectron2. Our prepared directory can be directly downloaded from [dropbox] [hugging face].
Similar to nuScenes, pred
includes 3D object detection results via a customized script in our folk of FCOS3D. It is only required by previous method AutoRF. Our prepared directory can be directly downloaded from [dropbox] [hugging face].
The object-center curated subsets and splits for our experiments are recorded in those .json
files in data/KITTI
. To modify the curation, check src/data_kitti.py
to re-run the preprocess step.
Waymo (Front View)
We use Waymo dataset validation split for cross-domain generalization test. We follow DEVIANT to prepare waymo dataset similar to KITTI. And we prepare additional directories for our experiments. The required data structure is as follows:
SUPNERF
├── data
│ ├── Waymo
│ │ ├── ImageSets
│ │ └── validation
│ │ ├── calib
│ │ ├── image
│ │ ├── label
│ │ ├── velodyne
│ │ ├── pred_instance
│ │ └── pred
│ │ ...
│ ...
calib
, image
, label
, velodyne
are directly prepared following DEVIANT. If you want to prepare on your own, you could download the validation set from Waymo website, and use our script data/Waymo/converter.py
. Our experiments are limited to the front view of Waymo. For all the surrounding views, you may refer to mmlab-version converter for the data preparation.
Similar to nuScenes, pred_instance
includes the required instance masks prepared via a customized script in our folk of mask-rcnn detectron2. Our prepared directory can be directly downloaded from [dropbox] [hugging face].
Similar to nuScenes, pred
includes 3D object detection results via a customized script in our folk of FCOS3D. It is only required by previous method AutoRF. Our prepared directory can be directly downloaded from [dropbox] [hugging face].
The object-centric curated subsets and splits for our experiments are recorded in those .json
files in data/Waymo
. To modify the curation, check src/data_waymo.py
to re-run the preprocess step.
VSCode Launch
All the training and testing pipelines described in the later sections are all included in .vscode/launch.json
for convinient usage and debug. You may modify the argments, and use VSCode 'Run and Debug' panel to execute any of the included pipelines.
Testing
<a href="https://huggingface.co/yuliangguo/SUP-NeRF-ECCV2024"> <img alt="Spaces" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Models-blue"> </a>For testing, optimize_nuscenes.py
can be used to evaluate the trained models. Models are only trained on the nuScenes dataset, but are tested on nuScene, KITTI, and Waymo datasets.
The specific checkpoint paths are appointed in the those config files. The default paths point to our provided checkpoints, which can be downloaded [dropbox]. You will need to save it to the repo as below before executing the following testing pipeline.
SUPNERF
├── checkpoints
│ ├── supnerf
│ └── autorfmix
│...
Specifically for the testing arguments used below, --add_pose_err 2
is to initialize with random pose, --add_pose_err 3
is to initialize with pose predicted by third-party detector FCOS3D. --reg_iter
indicate the number of iterations to execute pose refine module, which is a key design of SUPNeRF.
Changing --vis
to 1
makes the pipelie output visual results at the beginning and the end of the process, setting it to 2
makes the pipeline output visual outputs at every iteration, similar to those shown in the demo video. You can modify other arguments as needed. For more details, check optimize_nuscenes.py
, optimize_kitti.py
and optimize_waymo.py
.
You may also check scripts/eval_saved_result.py
to evaluate saved testing results quickly for quantitative numbers. The scores reported in the later sections are slightly different from the paper due to code cleaning, but the conclusions from the paper all hold. To evaluate all the provided saved results, execute
bash evaluate_all.sh
nuScenes (In-Domain)
To test SUPNeRF on nuScenes, execute
python optimize_nuscenes.py --config_file jsonfiles/supnerf.nusc.vehicle.car.json --gpu 0 --add_pose_err 2 --reg_iter 3 --vis 0
To test AutoRF on nuScenes, execute
python optimize_nuscenes.py --config_file jsonfiles/autorfmix.nusc.vehicle.car.json --gpu 0 --add_pose_err 3 --reg_iter 0 --vis 0
Testing results will be saved into a new folder created in the corresponding checkpoint folder. The quantitative evaluation results will be similar to:
Method | PSNR | Dep.E(m) | Rot.E(deg.) | Trans.E(m) | PSNR-C | DepE-C(m) | Config | Predictions |
---|---|---|---|---|---|---|---|---|
FF / 50it | FF / 50it | FF / 50it | FF / 50it | FF / 50it | FF / 50it | |||
SUP-NeRF (Ours) | 10.5 / 18.8 | 0.69 / 0.61 | 7.25 / 7.3 | 0.69 / 0.74 | 10.6 / 10.9 | 1.22 / 1.13 | config | predictions |
AutoRF-FCOS3D | 7.1 / 16.5 | 1.4 / 0.83 | 9.77 / 10.93 | 0.85 / 0.75 | 9.85 / 10.5 | 1.30 / 1.16 | config | predictions |
KITTI (Cross-Domain)
To test SUPNeRF on KITTI, execute
python optimize_kitti.py --config_file jsonfiles/supnerf.kitti.car.json --gpu 0 --add_pose_err 2 --reg_iter 3 --vis 0
To test AutoRF on KITTI, execute
python optimize_kitti.py --config_file jsonfiles/autorfmix.kitti.car.json --gpu 0 --add_pose_err 3 --reg_iter 0 --vis 0
Testing results will be saved into a new folder created in the corresponding checkpoint folder. The quantitative evaluation results will be similar to:
Method | PSNR | Dep.E(m) | Rot.E(deg.) | Trans.E(m) | Config | Predictions |
---|---|---|---|---|---|---|
FF / 50it | FF / 50it | FF / 50it | FF / 50it | |||
SUP-NeRF (Ours) | 5.0 / 14.6 | 1.51 / 1.11 | 8.89 / 8.85 | 1.49 / 1.55 | config | predictions |
AutoRF-FCOS3D | 1.3 / 11.0 | 2.72 / 1.80 | 11.79 / 18.51 | 2.2 / 1.95 | config | predictions |
Waymo (Cross-Domain)
To test SUPNeRF on Waymo, execute
python optimize_waymo.py --config_file jsonfiles/supnerf.waymo.car.json --gpu 0 --add_pose_err 2 --reg_iter 3 --vis 0
To test AutoRF on Waymo, execute
python optimize_waymo.py --config_file jsonfiles/autorfmix.waymo.car.json --gpu 0 --add_pose_err 3 --reg_iter 0 --vis 0
Testing results will be saved into a new folder created in the corresponding checkpoint folder. The quantitative evaluation results will be similar to:
Method | PSNR | Dep.E(m) | Rot.E(deg.) | Trans.E(m) | Config | Predictions |
---|---|---|---|---|---|---|
FF / 50it | FF / 50it | FF / 50it | FF / 50it | |||
SUP-NeRF (Ours) | 4.8 / 17.0 | 2.32 / 1.56 | 10.01 / 10.6 | 1.68 / 1.54 | config | predictions |
AutoRF-FCOS3D | 4.8 / 15.8 | 2.29 / 2.35 | 6.97 / 9.11 | 3.22 / 3.43 | config | predictions |
Training
To train SUPNeRF on nuScenes, execute
python train_nuscenes.py --config_file jsonfiles/supnerf.nusc.vehicle.car.json --gpus 4 --batch_size 48 --num_workers 16 --epochs 40
train_nuscenes.py
can train different object-centric NeRFs.
To train AutoRF on nuScenes, execute
python train_nuscenes.py --config_file jsonfiles/autorfmix.nusc.vehicle.car.json --gpus 4 --batch_size 48 --num_workers 16 --epochs 40
There are additional specific settings can be optionally changed. For those interested developers, check train_nuscenes.py
for details. You can also modify other hyperparameters in the corresponding json files included in jsonfiles/
. The network named autorfmix
is slightly different from the original AutoRF in encoder so that both SUPNeRF and AutoRF share the same encoder as CodeNeRF for fair comparison.
We implement multi-gpu training using DP rather than DDP (which might be more optimal) and record training logs using tensorboard.
BootInv
For those developers interested to evaluate BootInv (A.K.A. nerf-from-image) on real-world autonomous driving datasets like SUPNeRF and AutoRF, we provide our fork of BootInv with additional evaluation pipelines here.
You will need to follow the original instruction to install the package and prepared the pre-trained models. Then you can follow the same data preparation in this repo, while putting all the dataset structures under nerf-from-image/datasets/
. Then you can follow closely with the '.vscode/launch.json' file in our folk here to conduct testing and evaluation of BootInv on the three major autonomous dirving datasets including nuScenes, KITTI, and Waymo.
Acknowledgements
We thank the authors of the following awesome codebases:
Please also consider citing them.
License
SUP-NeRF code is under the MIT license.