Awesome
OccNeRF
Project Page | Paper | Data
OccNeRF: Advancing 3D Occupancy Prediction in LiDAR-Free Environments
Chubin Zhang*, Juncheng Yan* Yi Wei*, Jiaxin Li, Li Liu, Yansong Tang, Yueqi Duan, Jiwen Lu
Updates:
š 2023/12/17
Generated 2D semantic labels release.š 2023/12/15
Initial code and paper release.
š¹ Demos
Demos are a little bit large; please wait a moment to load them. If you cannot load them or feel them blurry, you can click the hyperlink of each demo for the full-resolution raw video.
Depth estimation:
<p align='center'> <img src="./assets/occnerf-demo.gif" width="480px"> </p>Occupancy prediction:
<p align='center'> <img src="./assets/occnerf-demo-sem.gif" width="480px"> <img src="./assets/bar.jpg" width="480px"> </p>š Introduction
In this paper, we propose an OccNeRF method for self-supervised multi-camera occupancy prediction. Different from bounded 3D occupancy labels, we need to consider unbounded scenes with raw image supervision. To solve the issue, we parameterize the reconstructed occupancy fields and reorganize the sampling strategy. The neural rendering is adopted to convert occupancy fields to multi-camera depth maps, supervised by multi-frame photometric consistency. Moreover, for semantic occupancy prediction, we design several strategies to polish the prompts and filter the outputs of a pretrained open-vocabulary 2D segmentation model.
š” Method
Method Pipeline:
<p align='center'> <img src="./assets/pipeline.jpg" width="720px"> </p>We first use a 2D backbone to extract multi-camera features, which are lifted to 3D space to get volume features with interpolation. The parameterized occupancy fields are reconstructed to describe unbounded scenes. To obtain the rendered depth and semantic maps, we perform volume rendering with our reorganized sampling strategy. The multi-frame depths are supervised by photometric loss. For semantic prediction, we adopted pretrained Grounded-SAM with prompts cleaning. The green arrow indicates supervision signals.
š§ Installation
Clone this repo and install the dependencies:
git clone --recurse-submodules https://github.com/LinShan-Bin/OccNeRF.git
cd OccNeRF
conda create -n occnerf python=3.8
conda activate occnerf
conda install pytorch==1.9.1 torchvision==0.10.1 cudatoolkit=11.3 -c pytorch -c conda-forge
pip install -r requirements.txt
Our code is tested with Python 3.8, PyTorch 1.9.1 and CUDA 11.3 and can be adapted to other versions of PyTorch and CUDA with minor modifications.
š Dataset Preparation
-
Download nuScenes V1.0 full dataset data from nuScenes and link the data folder to
./data/nuscenes/nuscenes/
. -
Download the ground truth occupancy labels from Occ3d and unzip the
gts.tar.gz
to./data/nuscenes/gts
. Note that we only use the 3d occupancy labels for validation. -
Generate the ground truth depth maps for validation:
python tools/export_gt_depth_nusc.py
-
Download the generated 2D semantic labels from semantic_labels and extract the data to
./data/nuscenes/
. We recommend that you usepigz
to speed up the process. -
Download the pretrained weights of our model from Checkpoints and move them to
./ckpts/
. -
(Optional) If you want to generate the 2D semantic labels by yourself, please refer to the
README.md
in GroundedSAM_OccNeRF. The dataset index pickle filenuscenes_infos_train.pkl
is from SurroundOcc and should be placed under./data/nuscenes/
.
The Final folder structure should be like:
OccNeRF/
āāā ckpts/
ā āāā nusc-depth/
ā ā āāā encoder.pth
ā ā āāā depth.pth
ā āāā nusc-sem/
ā ā āāā encoder.pth
ā ā āāā depth.pth
āāā data/
ā āāā nuscenes/
ā ā āāā nuscenes/
ā ā ā āāā maps/
ā ā ā āāā samples/
ā ā ā āāā sweeps/
ā ā ā āāā v1.0-trainval/
ā ā āāā gts/
ā ā āāā nuscenes_depth/
ā ā āāā nuscenes_semantic/
ā ā āāā nuscenes_infos_train.pkl
āāā ...
š Quick Start
Training
Train OccNeRF without semantic supervision:
python -m torch.distributed.launch --nproc_per_node=8 run.py --config configs/nusc-depth.txt
In order to train the full model, you need at least 80 GB GPU memory. If you have less GPU memory (e.g., 40 GB), you can train with a single frame (set auxiliary_frame = False
in the config file). See section 4.4 in the paper for the ablation study. Evaluation can be done with 24 GB GPU memory.
Train OccNeRF with semantic supervision:
python -m torch.distributed.launch --nproc_per_node=8 run.py --config configs/nusc-sem.txt
Evaluation
Evaluate the depth estimation:
python -m torch.distributed.launch --nproc_per_node=8 run.py --config configs/nusc-depth.txt --eval_only --load_weights_folder ckpts/nusc-depth
Evaluate the occupancy prediction:
python -m torch.distributed.launch --nproc_per_node=8 run.py --config configs/nusc-sem.txt --eval_only --load_weights_folder ckpts/nusc-sem
Visualization
Visualize the depth estimation:
python tools/export_vis_data.py # You can modify this file to choose scenes you want to visualize. Otherwise, all validation scenes will be visualized.
python -m torch.distributed.launch --nproc_per_node=8 run_vis.py --config configs/nusc-depth.txt --load_weights_folder ckpts/nusc-depth --log_dir your_log_dir
python gen_scene_video.py scene_folder_generated_by_the_above_command
Visualize the semantic occupancy prediction:
python tools/export_vis_data.py # You can modify this file to choose scenes you want to visualize. Otherwise, all validation scenes will be visualized.
python -m torch.distributed.launch --nproc_per_node=8 run_vis.py --config configs/nusc-sem.txt --load_weights_folder ckpts/nusc-sem --log_dir your_log_dir
python gen_scene_video.py scene_folder_generated_by_the_above_command --sem_only
š Acknowledgement
Many thanks to these excellent projects:
š Bibtex
If this work is helpful for your research, please consider citing the following BibTeX entry.
@article{chubin2023occnerf,
title = {OccNeRF: Advancing 3D Occupancy Prediction in LiDAR-Free Environments},
author = {Chubin Zhang and Juncheng Yan and Yi Wei and Jiaxin Li and Li Liu and Yansong Tang and Yueqi Duan and Jiwen Lu},
journal = {arXiv preprint arXiv:2312.09243},
year = {2023}
}