Awesome
GaussianFormer: Scene as Gaussians for Vision-Based 3D Semantic Occupancy Prediction
Paper | Project Page
GaussianFormer: Scene as Gaussians for Vision-Based 3D Semantic Occupancy Prediction
Yuanhui Huang, Wenzhao Zheng$\dagger$, Yunpeng Zhang, Jie Zhou, Jiwen Lu$\ddagger$
$\dagger$ Project leader $\ddagger$ Corresponding author
π₯A pioneering step towards building an object-centric autonomous driving system. π₯
GaussianFormer proposes the 3D semantic Gaussians as a more efficient object-centric representation for driving scenes compared with 3D occupancy.
News.
- [2024/09/30] Occupancy and Gaussian visualization code release.
- [2024/09/12] Training code release.
- [2024/09/05] An updated version of GaussianFormer modeling only the occupied area.
- [2024/09/05] Model weights and evaluation code release.
- [2024/07/01] GaussianFormer is accepted to ECCV24!
- [2024/05/28] Paper released on arXiv.
- [2024/05/28] Demo release.
Demo
Overview
Considering the universal approximating ability of Gaussian mixture, we propose an object-centric 3D semantic Gaussian representation to describe the fine-grained structure of 3D scenes without the use of dense grids. We propose a GaussianFormer model consisting of sparse convolution and cross-attention to efficiently transform 2D images into 3D Gaussian representations. To generate dense 3D occupancy, we design a Gaussian-to-voxel splatting module that can be efficiently implemented with CUDA. With comparable performance, our GaussianFormer reduces memory consumption of existing 3D occupancy prediction methods by 75.2% - 82.2%.
Getting Started
Installation
Follow instructions HERE to prepare the environment.
<!-- The environment is almost the same as [SelfOcc](https://github.com/huang-yh/SelfOcc) except for two additional CUDA operations. ``` 1. Follow instructions in SelfOcc to prepare the environment. Not that we do not need packages related to NeRF, so feel safe to skip them. 2. cd model/encoder/gaussian_encoder/ops && pip install -e . # deformable cross attention with image features 3. cd model/head/localagg && pip install -e . # Gaussian-to-Voxel splatting ``` -->Data Preparation
-
Download nuScenes V1.0 full dataset data HERE.
-
Download the occupancy annotations from SurroundOcc HERE and unzip it.
-
Download pkl files HERE.
Folder structure
GaussianFormer
βββ ...
βββ data/
β βββ nuscenes/
β β βββ maps/
β β βββ samples/
β β βββ sweeps/
β β βββ v1.0-test/
| | βββ v1.0-trainval/
β βββ nuscenes_cam/
β β βββ nuscenes_infos_train_sweeps_occ.pkl
β β βββ nuscenes_infos_val_sweeps_occ.pkl
β βββ surroundocc/
β β βββ samples/
β β | βββ xxxxxxxx.pcd.bin.npy
β β | βββ ...
Inference
We provide two checkpoints trained on the SurroundOcc dataset:
-
The checkpoint that reproduces the result in Table.1 of our paper.
-
π₯π₯An updated version of GaussianFormer which assigns semantic Gaussians to model only the occupied area while leaving the empty space to one fixed infinitely large Gaussian. This modification can significant reduce the number of Gaussians to achieve similar model capacity (144000 -> 25600), thus being even more efficient. Check our GaussianHead for more details.
python eval.py --py-config config/nuscenes_gs144000.py --work-dir out/nuscenes_gs144000/ --resume-from out/nuscenes_gs144000/state_dict.pth
python eval.py --py-config config/nuscenes_gs25600_solid.py --work-dir out/nuscenes_gs25600_solid/ --resume-from out/nuscenes_gs25600_solid/state_dict.pth
Train
Run the following command to launch your training process. Note that the setting with 144000 Gaussians requires ~40G GPU memory in the training phase. So we recommend trying out the 25600 version which achieves even better performance!π
Download the pretrained weights for the image backbone HERE and put it inside ckpts.
CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py --py-config config/nuscenes_gs25600_solid.py --work-dir out/nuscenes_gs25600_solid
Config | mIoU | Log | Weight |
---|---|---|---|
nuscenes_gs25600_solid | 19.31 | log | weight |
Stay tuned for more exciting work and models!π€
Visualize
Install packages for visualization according to the documentation. Here is an example command where you can change --num-samples and --vis-index.
CUDA_VISIBLE_DEVICES=0 python visualize.py --py-config config/nuscenes_gs25600_solid.py --work-dir out/nuscenes_gs25600_solid --resume-from out/nuscenes_gs25600_solid/state_dict.pth --vis-occ --vis-gaussian --num-samples 3
Related Projects
Our work is inspired by these excellent open-sourced repos: TPVFormer PointOcc SelfOcc SurroundOcc OccFormer BEVFormer
Our code is originally based on Sparse4D and migrated to the general framework of SelfOcc.
Citation
If you find this project helpful, please consider citing the following paper:
@article{huang2024gaussian,
title={GaussianFormer: Scene as Gaussians for Vision-Based 3D Semantic Occupancy Prediction},
author={Huang, Yuanhui and Zheng, Wenzhao and Zhang, Yunpeng and Zhou, Jie and Lu, Jiwen},
journal={arXiv preprint arXiv:2405.17429},
year={2024}
}