Awesome
GaussTR: Foundation Model-Aligned Gaussian Transformer for Self-Supervised 3D Spatial Understanding
Setup
Installation
pip install -r requirements
Dataset Preparation
Follow the mmdetection3d instructions for preparing the nuScenes dataset.
Then update it with scene_idx
to match the occupancy ground truths.
python tools/create_data.py nuscenes --root-path ./data/nuscenes --out-dir ./data/nuscenes --extra-tag nuscenes
Download gts
from CVPR2023-3D-Occupancy-Prediction and place them under data/nuscenes/gts
.
Generate features and rendering targets using Metric 3D V2, FeatUp for MaskCLIP, and Grounded SAM 2.
CLIP Text Embeddings
Generate CLIP text embeddings for the categories of interest by referring to https://github.com/open-mmlab/mmpretrain/pull/1737.
Usage
Training
PYTHONPATH=. mim train mmdet3d configs/gausstr/gausstr.py -l pytorch -G [GPU_NUM]
Testing
PYTHONPATH=. mim test mmdet3d configs/gausstr/gausstr.py -C [CKPT_PATH]
Visualization
After testing with DumpResultHook
, visualize the results using:
python tools/visualize.py [PKL_PATH] [--save]
Citation
If you find our paper and code helpful for your research, please consider starring this repository :star: and citing our work:
@article{GaussTR,
title = {GaussTR: Foundation Model-Aligned Gaussian Transformer for Self-Supervised 3D Spatial Understanding},
author = {Haoyi Jiang and Liu Liu and Tianheng Cheng and Xinjie Wang and Tianwei Lin and Zhizhong Su and Wenyu Liu and Xinggang Wang},
year = 2024,
journal = {arXiv preprint arXiv:2412.13193}
}
Acknowledgements
This project builds upon the pioneering work of FeatUp, MaskCLIP and gsplat. We extend our gratitude to these projects for their contributions to the community.
License
Released under the MIT License.