Home

Awesome

3D-Box via Segment Anything

We extend Segment Anything to 3D perception by combining it with VoxelNeXt. Note that this project is still in progress. We are improving it and dveloping more examples. Any issue or pull request is welcome!

<p align="center"> <img src="images/sam-voxelnext.png" width="100%"> </p>

Why this project?

Segment Anything and its following projects focus on 2D images. In this project, we extend the scope to 3D world by combining Segment Anything and VoxelNeXt. When we provide a prompt (e.g., a point / box), the result is not only 2D segmentation mask, but also 3D boxes.

The core idea is that VoxelNeXt is a fully sparse 3D detector. It predicts 3D object upon each sparse voxel. We project 3D sparse voxels onto 2D images. And then 3D boxes can be generated for voxels in the SAM mask.

Installation

  1. Basic requirements pip install -r requirements.txt
  2. Segment anything pip install git+https://github.com/facebookresearch/segment-anything.git
  3. spconv pip install spconv or cuda version spconv pip install spconv-cu111 based on your cuda version. Please use spconv 2.2 / 2.3 version, for example spconv==2.3.5

Getting Started

Please try it via seg_anything_and_3D.ipynb. We provide this example on nuScenes dataset. You can use other image-points pairs.

<p align="center"> <img src="images/mask_box.png" width="100%"> </p> <p align="center"> <img src="images/image_boxes1.png" width="100%"> </p> <p align="center"> <img src="images/image_boxes2.png" width="100%"> </p> <p align="center"> <img src="images/image_boxes3.png" width="100%"> </p>

TODO List

Citation

If you find this project useful in your research, please consider citing:

@article{kirillov2023segany,
  title={Segment Anything}, 
  author={Kirillov, Alexander and Mintun, Eric and Ravi, Nikhila and Mao, Hanzi and Rolland, Chloe and Gustafson, Laura and Xiao, Tete and Whitehead, Spencer and Berg, Alexander C. and Lo, Wan-Yen and Doll{\'a}r, Piotr and Girshick, Ross},
  journal={arXiv:2304.02643},
  year={2023}
}

@inproceedings{chen2023voxenext,
  title={VoxelNeXt: Fully Sparse VoxelNet for 3D Object Detection and Tracking},
  author={Yukang Chen and Jianhui Liu and Xiangyu Zhang and Xiaojuan Qi and Jiaya Jia},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2023}
}

Acknowledgement

Our Works in 3D Perception