

<!-- PROJECT LOGO --> <p align="center"> <img src="https://mutianxu.github.io/sampro3d/static/images/icon_final.jpg" alt="" width="150" height="50"/> <h1 align="center">SAMPro3D: Locating SAM Prompts in 3D

for Zero-Shot Scene Segmentation</h1>

<p align="center"> <a href="https://mutianxu.github.io"><strong>Mutian Xu</strong></a> · <strong>Xingyilang Yin</strong></a> · <a href="https://lingtengqiu.github.io/"><strong>Lingteng Qiu</strong></a> · <a href="https://xueyuhanlang.github.io/"><strong>Yang Liu</strong></a> · <a href="https://www.microsoft.com/en-us/research/people/xtong/"><strong>Xin Tong</strong></a> · <a href="https://gaplab.cuhk.edu.cn/"><strong>Xiaoguang Han</strong></a> <br> SSE, CUHKSZ · FNii, CUHKSZ · Microsoft Research Asia </p> <!-- <h2 align="center">CVPR 2023</h2> --> <h3 align="center"><a href="https://arxiv.org/abs/2311.17707">Paper</a> | <a href="https://mutianxu.github.io/sampro3d/">Project Page</a></h3> <div align="center"></div> </p> <p align="center"> <a href=""> <img src="https://mutianxu.github.io/sampro3d/static/images/teaser.jpg" alt="Logo" width="80%"> </a> </p>

SAMPro3D can segment ANY 😯😯😯 3D indoor scenes <b>WITHOUT</b> training ❗️❗️❗️. It achieves higher quality and more diverse segmentation than previous zero-shot or fully supervised approaches, and in many cases even surpasses human-level annotations. <br>

If you find our code or work helpful, please cite:

        title={SAMPro3D: Locating SAM Prompts in 3D for Zero-Shot Scene Segmentation}, 
        author={Mutian Xu and Xingyilang Yin and Lingteng Qiu and Yang Liu and Xin Tong and Xiaoguang Han},
        journal = {arXiv preprint arXiv:2311.17707}
<!-- TABLE OF CONTENTS --> <details open="open" style='padding: 10px; border-radius:5px 30px 30px 5px; border-style: solid; border-width: 1px;'> <summary>Table of Contents</summary> <ol> <li> <a href="#-news">News</a> </li> <li> <a href="#requirements-and-installation">Requirements and Installation</a> </li> <li> <a href="#data-preparation">Data Preparation</a> </li> <li> <a href="#run-sampro3d">Run SAMPro3D</a> </li> <li> <a href="#animated-qualitative-comparison">Animated Qualitative Comparison</a> </li> <li> <a href="#segment-your-own-3d-scene">Segment Your Own 3D Scene</a> </li> <li> <a href="#todo">TODO</a> </li> <li> <a href="#contact">Contact</a> </li> <li> <a href="#acknowledgement">Acknowledgement</a> </li> </ol> </details>

📢 News

Requirements and Installation

Hardware requirements

At least 1 GPU to hold around 8000MB. Moreover, it is highly recommended to utilize both a CPU with ample processing power and a disk with fast I/O capabilities. Additionally, the disk needs to be large enough (about 50 MB for a 2D frame of resolution 240*320, totally around 160 GB for 2500 frames of a large-scale scene).

Software installation

Follow the installation instruction to install all required packages.

Data Preparation

Follow the data pre-processing instruction to download and preprocess data.

Run SAMPro3D

3D Prompt Proposal

The initial stage of SAMPro3D involves generating a 3D prompt and executing SAM segmentation, followed by saving the SAM outputs for subsequent stages. To initiate this process, simply run:

python 3d_prompt_proposal.py --data_path /PATH_TO/ScanNet_data --scene_name sceneXXXX_XX --prompt_path /PATH_TO/initial_prompt --sam_output_path /PATH_TO/SAM_outputs --device cuda:0

This stage will be the only step to perform SAM inference, accounting for the majority of computational time and memory usage within our entire pipeline.

Note on time efficiency: This stage will save SAM outputs into .npy files for later use. Due to different hardware conditions (CPU and disk), the I/O speed of SAM output files may vary a lot and impact the running time of our pipeline. Please refer to the hardware recommendations mentioned before to prepare your hardware for the best efficiency.

(Optional: Partial-Area Segmentation): At this stage, you can also perform 3D segmentation on partial point clouds captured by limited 2D frames, by simply changing the frame_id_init and frame_id_end at here, then running the script. Sometimes this works better than segmenting the whole point clouds (thanks to less complicated scenes and better frame-consistency).

Finish Segmentation and Result Visualization

Next, we will proceed with filtering and consolidating the initial prompts, leveraging the saved SAM outputs generated during the 3D Prompt Proposal phase, to obtain the final 3D segmentations. This can be realized by executing the following command:

python main.py --data_path /PATH_TO/ScanNet_data --scene_name sceneXXXX_XX --prompt_path /PATH_TO/initial_prompt --sam_output_path /PATH_TO/SAM_outputs --pred_path /PATH_TO/sampro3d_predictions --output_vis_path /PATH_TO/result_visualization --device cuda:0

After finishing this, the visualization result of the final 3D segmentation will be automatically 😊 saved as sceneXXXX_XX.ply file in the path specified by --output_vis_path.

Note on post-processing of the floor: Using our framework, you can usually get a decent segmentation of the floor. However, for a large-scale floor, we use post-processing for perfect segmentation of the floor. For small-scale scenes (e.g., scene0050_00 in ScanNet), you can skip this step by simply adding --args.post_floor False to the previous command.

Time estimation ⚡️

If everything goes well, the entire pipeline will just take 15 min for a large-scale 3D scene captured by 2000 2D frames. (WE DO NOT NEED TRAIN❗️)

Animated Qualitative Comparison



🌟 Segment Your Own 3D Scene

With our advanced framework, you can generate high-quality segmentations on your own 3D scene without the need for training! Here are the steps you can follow:



You are welcome to submit issues, send pull requests, or share some ideas with us. If you have any other questions, please contact Mutian Xu (mutianxu@link.cuhk.edu.cn).


Our code base is partially borrowed or adapted from SAM, OpenScene and Pointcept.