Home

Awesome

GaussTR: Foundation Model-Aligned Gaussian Transformer for Self-Supervised 3D Spatial Understanding

arXiv License: MIT

Setup

Installation

pip install -r requirements

Dataset Preparation

Follow the mmdetection3d instructions for preparing the nuScenes dataset. Then update it with scene_idx to match the occupancy ground truths.

python tools/create_data.py nuscenes --root-path ./data/nuscenes --out-dir ./data/nuscenes --extra-tag nuscenes

Download gts from CVPR2023-3D-Occupancy-Prediction and place them under data/nuscenes/gts.

Generate features and rendering targets using Metric 3D V2, FeatUp for MaskCLIP, and Grounded SAM 2.

CLIP Text Embeddings

Generate CLIP text embeddings for the categories of interest by referring to https://github.com/open-mmlab/mmpretrain/pull/1737.

Usage

Training

PYTHONPATH=. mim train mmdet3d configs/gausstr/gausstr.py -l pytorch -G [GPU_NUM]

Testing

PYTHONPATH=. mim test mmdet3d configs/gausstr/gausstr.py -C [CKPT_PATH]

Visualization

After testing with DumpResultHook, visualize the results using:

python tools/visualize.py [PKL_PATH] [--save]

Citation

If you find our paper and code helpful for your research, please consider starring this repository :star: and citing our work:

@article{GaussTR,
    title = {GaussTR: Foundation Model-Aligned Gaussian Transformer for Self-Supervised 3D Spatial Understanding}, 
    author = {Haoyi Jiang and Liu Liu and Tianheng Cheng and Xinjie Wang and Tianwei Lin and Zhizhong Su and Wenyu Liu and Xinggang Wang},
    year = 2024,
    journal = {arXiv preprint arXiv:2412.13193}
}

Acknowledgements

This project builds upon the pioneering work of FeatUp, MaskCLIP and gsplat. We extend our gratitude to these projects for their contributions to the community.

License

Released under the MIT License.