Home

Awesome

Mask3D: Mask Transformer for 3D Instance Segmentation

<div align="center"> <a href="https://jonasschult.github.io/">Jonas Schult</a><sup>1</sup>, <a href="https://francisengelmann.github.io/">Francis Engelmann</a><sup>2,3</sup>, <a href="https://www.vision.rwth-aachen.de/person/10/">Alexander Hermans</a><sup>1</sup>, <a href="https://orlitany.github.io/">Or Litany</a><sup>4</sup>, <a href="https://inf.ethz.ch/people/person-detail.MjYyNzgw.TGlzdC8zMDQsLTg3NDc3NjI0MQ==.html">Siyu Tang</a><sup>3</sup>, <a href="https://www.vision.rwth-aachen.de/person/1/">Bastian Leibe</a><sup>1</sup>

<sup>1</sup>RWTH Aachen University <sup>2</sup>ETH AI Center <sup>3</sup>ETH Zurich <sup>4</sup>NVIDIA

Mask3D predicts accurate 3D semantic instances achieving state-of-the-art on ScanNet, ScanNet200, S3DIS and STPLS3D.

PWC PWC PWC PWC

<a href="https://pytorch.org/get-started/locally/"><img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-ee4c2c?logo=pytorch&logoColor=white"></a> <a href="https://pytorchlightning.ai/"><img alt="Lightning" src="https://img.shields.io/badge/-Lightning-792ee5?logo=pytorchlightning&logoColor=white"></a> <a href="https://hydra.cc/"><img alt="Config: Hydra" src="https://img.shields.io/badge/Config-Hydra-89b8cd"></a>

teaser

</div> <br><br>

[Project Webpage] [Paper] [Demo]

News

Code structure

We adapt the codebase of Mix3D which provides a highly modularized framework for 3D Semantic Segmentation based on the MinkowskiEngine.

├── mix3d
│   ├── main_instance_segmentation.py <- the main file
│   ├── conf                          <- hydra configuration files
│   ├── datasets
│   │   ├── preprocessing             <- folder with preprocessing scripts
│   │   ├── semseg.py                 <- indoor dataset
│   │   └── utils.py        
│   ├── models                        <- Mask3D modules
│   ├── trainer
│   │   ├── __init__.py
│   │   └── trainer.py                <- train loop
│   └── utils
├── data
│   ├── processed                     <- folder for preprocessed datasets
│   └── raw                           <- folder for raw datasets
├── scripts                           <- train scripts
├── docs
├── README.md
└── saved                             <- folder that stores models and logs

Dependencies :memo:

The main dependencies of the project are the following:

python: 3.10.9
cuda: 11.3

You can set up a conda environment as follows

# Some users experienced issues on Ubuntu with an AMD CPU
# Install libopenblas-dev (issue #115, thanks WindWing)
# sudo apt-get install libopenblas-dev

export TORCH_CUDA_ARCH_LIST="6.0 6.1 6.2 7.0 7.2 7.5 8.0 8.6"

conda env create -f environment.yml

conda activate mask3d_cuda113

pip3 install torch==1.12.1+cu113 torchvision==0.13.1+cu113 --extra-index-url https://download.pytorch.org/whl/cu113
pip3 install torch-scatter -f https://data.pyg.org/whl/torch-1.12.1+cu113.html
pip3 install 'git+https://github.com/facebookresearch/detectron2.git@710e7795d0eeadf9def0e7ef957eea13532e34cf' --no-deps

mkdir third_party
cd third_party

git clone --recursive "https://github.com/NVIDIA/MinkowskiEngine"
cd MinkowskiEngine
git checkout 02fc608bea4c0549b0a7b00ca1bf15dee4a0b228
python setup.py install --force_cuda --blas=openblas

cd ..
git clone https://github.com/ScanNet/ScanNet.git
cd ScanNet/Segmentator
git checkout 3e5726500896748521a6ceb81271b0f5b2c0e7d2
make

cd ../../pointnet2
python setup.py install

cd ../../
pip3 install pytorch-lightning==1.7.2

Data preprocessing :hammer:

After installing the dependencies, we preprocess the datasets.

ScanNet / ScanNet200

First, we apply Felzenswalb and Huttenlocher's Graph Based Image Segmentation algorithm to the test scenes using the default parameters. Please refer to the original repository for details. Put the resulting segmentations in ./data/raw/scannet_test_segments.

python -m datasets.preprocessing.scannet_preprocessing preprocess \
--data_dir="PATH_TO_RAW_SCANNET_DATASET" \
--save_dir="data/processed/scannet" \
--git_repo="PATH_TO_SCANNET_GIT_REPO" \
--scannet200=false/true

S3DIS

The S3DIS dataset contains some smalls bugs which we initially fixed manually. We will soon release a preprocessing script which directly preprocesses the original dataset. For the time being, please follow the instructions here to fix the dataset manually. Afterwards, call the preprocessing script as follows:

python -m datasets.preprocessing.s3dis_preprocessing preprocess \
--data_dir="PATH_TO_Stanford3dDataset_v1.2" \
--save_dir="data/processed/s3dis"

STPLS3D

python -m datasets.preprocessing.stpls3d_preprocessing preprocess \
--data_dir="PATH_TO_STPLS3D" \
--save_dir="data/processed/stpls3d"

Training and testing :train2:

Train Mask3D on the ScanNet dataset:

python main_instance_segmentation.py

Please refer to the config scripts (for example here) for detailed instructions how to reproduce our results. In the simplest case the inference command looks as follows:

python main_instance_segmentation.py \
general.checkpoint='PATH_TO_CHECKPOINT.ckpt' \
general.train_mode=false

Trained checkpoints :floppy_disk:

We provide detailed scores and network configurations with trained checkpoints.

S3DIS (pretrained on ScanNet train+val)

Following PointGroup, HAIS and SoftGroup, we finetune a model pretrained on ScanNet (config and checkpoint).

DatasetAPAP_50AP_25ConfigCheckpoint :floppy_disk:Scores :chart_with_upwards_trend:Visualizations :telescope:
Area 169.381.987.7configcheckpointscoresvisualizations
Area 244.059.566.5configcheckpointscoresvisualizations
Area 373.483.288.2configcheckpointscoresvisualizations
Area 458.069.574.9configcheckpointscoresvisualizations
Area 557.871.977.2configcheckpointscoresvisualizations
Area 668.479.985.2configcheckpointscoresvisualizations

S3DIS (from scratch)

DatasetAPAP_50AP_25ConfigCheckpoint :floppy_disk:Scores :chart_with_upwards_trend:Visualizations :telescope:
Area 174.185.189.6configcheckpointscoresvisualizations
Area 244.957.167.9configcheckpointscoresvisualizations
Area 374.484.488.1configcheckpointscoresvisualizations
Area 463.874.781.1configcheckpointscoresvisualizations
Area 556.668.475.2configcheckpointscoresvisualizations
Area 673.383.487.8configcheckpointscoresvisualizations

ScanNet v2

DatasetAPAP_50AP_25ConfigCheckpoint :floppy_disk:Scores :chart_with_upwards_trend:Visualizations :telescope:
ScanNet val55.273.783.5configcheckpointscoresvisualizations
ScanNet test56.678.087.0configcheckpointscoresvisualizations

ScanNet 200

DatasetAPAP_50AP_25ConfigCheckpoint :floppy_disk:Scores :chart_with_upwards_trend:Visualizations :telescope:
ScanNet200 val27.437.042.3configcheckpointscoresvisualizations
ScanNet200 test27.838.844.5configcheckpointscoresvisualizations

STPLS3D

DatasetAPAP_50AP_25ConfigCheckpoint :floppy_disk:Scores :chart_with_upwards_trend:Visualizations :telescope:
STPLS3D val57.374.381.6configcheckpointscoresvisualizations
STPLS3D test63.479.285.6configcheckpointscoresvisualizations

BibTeX :pray:

@article{Schult23ICRA,
  title     = {{Mask3D: Mask Transformer for 3D Semantic Instance Segmentation}},
  author    = {Schult, Jonas and Engelmann, Francis and Hermans, Alexander and Litany, Or and Tang, Siyu and Leibe, Bastian},
  booktitle = {{International Conference on Robotics and Automation (ICRA)}},
  year      = {2023}
}