Home

Awesome

GaussianFormer: Scene as Gaussians for Vision-Based 3D Semantic Occupancy Prediction

Paper | Project Page

GaussianFormer: Scene as Gaussians for Vision-Based 3D Semantic Occupancy Prediction

Yuanhui Huang, Wenzhao Zheng$\dagger$, Yunpeng Zhang, Jie Zhou, Jiwen Lu$\ddagger$

$\dagger$ Project leader $\ddagger$ Corresponding author

πŸ’₯A pioneering step towards building an object-centric autonomous driving system. πŸ’₯

GaussianFormer proposes the 3D semantic Gaussians as a more efficient object-centric representation for driving scenes compared with 3D occupancy.

teaser

News.

Demo

demo

legend

Overview

comparisons

Considering the universal approximating ability of Gaussian mixture, we propose an object-centric 3D semantic Gaussian representation to describe the fine-grained structure of 3D scenes without the use of dense grids. We propose a GaussianFormer model consisting of sparse convolution and cross-attention to efficiently transform 2D images into 3D Gaussian representations. To generate dense 3D occupancy, we design a Gaussian-to-voxel splatting module that can be efficiently implemented with CUDA. With comparable performance, our GaussianFormer reduces memory consumption of existing 3D occupancy prediction methods by 75.2% - 82.2%.

overview

Getting Started

Installation

Follow instructions HERE to prepare the environment.

<!-- The environment is almost the same as [SelfOcc](https://github.com/huang-yh/SelfOcc) except for two additional CUDA operations. ``` 1. Follow instructions in SelfOcc to prepare the environment. Not that we do not need packages related to NeRF, so feel safe to skip them. 2. cd model/encoder/gaussian_encoder/ops && pip install -e . # deformable cross attention with image features 3. cd model/head/localagg && pip install -e . # Gaussian-to-Voxel splatting ``` -->

Data Preparation

  1. Download nuScenes V1.0 full dataset data HERE.

  2. Download the occupancy annotations from SurroundOcc HERE and unzip it.

  3. Download pkl files HERE.

Folder structure

GaussianFormer
β”œβ”€β”€ ...
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ nuscenes/
β”‚   β”‚   β”œβ”€β”€ maps/
β”‚   β”‚   β”œβ”€β”€ samples/
β”‚   β”‚   β”œβ”€β”€ sweeps/
β”‚   β”‚   β”œβ”€β”€ v1.0-test/
|   |   β”œβ”€β”€ v1.0-trainval/
β”‚   β”œβ”€β”€ nuscenes_cam/
β”‚   β”‚   β”œβ”€β”€ nuscenes_infos_train_sweeps_occ.pkl
β”‚   β”‚   β”œβ”€β”€ nuscenes_infos_val_sweeps_occ.pkl
β”‚   β”œβ”€β”€ surroundocc/
β”‚   β”‚   β”œβ”€β”€ samples/
β”‚   β”‚   |   β”œβ”€β”€ xxxxxxxx.pcd.bin.npy
β”‚   β”‚   |   β”œβ”€β”€ ...

Inference

We provide two checkpoints trained on the SurroundOcc dataset:

  1. The checkpoint that reproduces the result in Table.1 of our paper.

  2. πŸ”₯πŸ”₯An updated version of GaussianFormer which assigns semantic Gaussians to model only the occupied area while leaving the empty space to one fixed infinitely large Gaussian. This modification can significant reduce the number of Gaussians to achieve similar model capacity (144000 -> 25600), thus being even more efficient. Check our GaussianHead for more details.

python eval.py --py-config config/nuscenes_gs144000.py --work-dir out/nuscenes_gs144000/ --resume-from out/nuscenes_gs144000/state_dict.pth

python eval.py --py-config config/nuscenes_gs25600_solid.py --work-dir out/nuscenes_gs25600_solid/ --resume-from out/nuscenes_gs25600_solid/state_dict.pth

Train

Run the following command to launch your training process. Note that the setting with 144000 Gaussians requires ~40G GPU memory in the training phase. So we recommend trying out the 25600 version which achieves even better performance!πŸš€

Download the pretrained weights for the image backbone HERE and put it inside ckpts.

CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py --py-config config/nuscenes_gs25600_solid.py --work-dir out/nuscenes_gs25600_solid
ConfigmIoULogWeight
nuscenes_gs25600_solid19.31logweight

Stay tuned for more exciting work and models!πŸ€—

Related Projects

Our work is inspired by these excellent open-sourced repos: TPVFormer PointOcc SelfOcc SurroundOcc OccFormer BEVFormer

Our code is originally based on Sparse4D and migrated to the general framework of SelfOcc.

Citation

If you find this project helpful, please consider citing the following paper:

@article{huang2024gaussian,
    title={GaussianFormer: Scene as Gaussians for Vision-Based 3D Semantic Occupancy Prediction},
    author={Huang, Yuanhui and Zheng, Wenzhao and Zhang, Yunpeng and Zhou, Jie and Lu, Jiwen},
    journal={arXiv preprint arXiv:2405.17429},
    year={2024}
}