Home

Awesome

[CVPR 2024] Rethinking Few-shot 3D Point Cloud Semantic Segmentation

Zhaochong An, Guolei Sun<sup></sup>, Yun Liu<sup></sup>, Fayao Liu, Zongwei Wu, Dan Wang, Luc Van Gool, Serge Belongie

Welcome to the official PyTorch implementation repository of our paper Rethinking Few-shot 3D Point Cloud Semantic Segmentation, accepted to CVPR 2024 [arXiv].

Highlight

The first thing we want you to be aware from this paper:

<p align="center"><i>please ensure you are using our <strong>corrected setting</strong> for the development and evaluation of your 3D few-shot models</i>.</p>
<div align="center"> <img src="figs/sampling.jpg"/> </div>
  1. Identification of Key Issues: We pinpoint two significant issues in the current Few-shot 3D Point Cloud Semantic Segmentation (FS-PCS) setting: foreground leakage and sparse point distribution. These issues have undermined the validity of previous progress and hindered further advancements.
  2. Standardized Setting and Benchmark: To rectify existing issues, we propose a standardized FS-PCS setting along with a new benchmark. This enables fair comparisons and fosters future advancements in the field. Our repository implements an effective few-shot running pipeline on our proposed standard FS-PCS setting, facilitating easy development for future researchers based on our code base.
<div align="center"> <img src="figs/arch.jpg"/> </div>
  1. Novel Method (COSeg): Our method introduces a novel correlation optimization paradigm, diverging from the traditional feature optimization approach used by all previous FS-PCS models. COSeg achieves state-of-the-art performance on both S3DIS and ScanNetv2 datasets, demonstrating effective contextual learning and background correlation adjustment ability.

Get Started

Environment

The following environment setup instructions have been tested on RTX 3090 GPUs with GCC 6.3.0.

  1. Install dependencies
pip install -r requirements.txt

If you have any problem with the above command, you can also install them by

pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 torchaudio==0.11.0 --extra-index-url https://download.pytorch.org/whl/cu113
pip install torch_points3d==1.3.0
pip install torch-scatter==2.1.1
pip install torch-points-kernels==0.6.10
pip install torch-geometric==1.7.2
pip install timm==0.9.2
pip install tensorboardX==2.6
pip install numpy==1.20.3

For incompatiable installation issues, such as wanting a higher torch version (e.g., 2.1.0) but conflicts with torch_points3d, please refer to this thread: https://github.com/ZhaochongAn/COSeg/issues/16 or feel free to open a new discussion for further assistance.

  1. Compile pointops

Ensure you have gcc, cuda, and nvcc installed. Compile and install pointops2 as follows:

cd lib/pointops2
python3 setup.py install

Datasets Preparation

You can either directly download the preprocessed dataset directly from the links provided below or perform the preprocessing steps on your own.

Preprocessed Datasets

DatasetDownload
S3DISDownload link
ScanNetDownload link

Preprocessing Instructions

S3DIS

  1. Download: S3DIS Dataset Version 1.2.
  2. Preprocessing: Re-organize raw data into npy files:
    cd preprocess
    python collect_s3dis_data.py --data_path [PATH_to_S3DIS_raw_data] --save_path [PATH_to_S3DIS_processed_data]
    
    The generated numpy files will be stored in PATH_to_S3DIS_processed_data/scenes.
  3. Splitting Rooms into Blocks:
    python room2blocks.py --data_path [PATH_to_S3DIS_processed_data]/scenes
    

ScanNet

  1. Download: ScanNet V2.
  2. Preprocessing: Re-organize raw data into npy files:
    cd preprocess
    python collect_scannet_data.py --data_path [PATH_to_ScanNet_raw_data] --save_path [PATH_to_ScanNet_processed_data]
    
    The generated numpy files will be stored in PATH_to_ScanNet_processed_data/scenes.
  3. Splitting Rooms into Blocks:
    python room2blocks.py --data_path [PATH_to_ScanNet_processed_data]/scenes
    

After preprocessing the datasets, a folder named blocks_bs1_s1 will be generated under PATH_to_DATASET_processed_data. Make sure to update the data_root entry in the .yaml config file to [PATH_to_DATASET_processed_data]/blocks_bs1_s1/data.

Model weights

We provide the trained model weights across different few-shot settings and datasets below. The training and testing are using 4 RTX 3090 GPUs. Please note that these weights have been retrained by us, which may have slight differences from reported results. You could directly load these weights for evaluation or train your own models following the training instructions.

Model nameDatasetCVFOLDN-way K-shotModel Weight
s30_1w1sS3DIS01-way 1-shotDownload link
s30_1w5sS3DIS01-way 5-shotDownload link
s30_2w1sS3DIS02-way 1-shotDownload link
s30_2w5sS3DIS02-way 5-shotDownload link
s31_1w1sS3DIS11-way 1-shotDownload link
s31_1w5sS3DIS11-way 5-shotDownload link
s31_2w1sS3DIS12-way 1-shotDownload link
s31_2w5sS3DIS12-way 5-shotDownload link
sc0_1w1sScanNet01-way 1-shotDownload link
sc0_1w5sScanNet01-way 5-shotDownload link
sc0_2w1sScanNet02-way 1-shotDownload link
sc0_2w5sScanNet02-way 5-shotDownload link
sc1_1w1sScanNet11-way 1-shotDownload link
sc1_1w5sScanNet11-way 5-shotDownload link
sc1_2w1sScanNet12-way 1-shotDownload link
sc1_2w5sScanNet12-way 5-shotDownload link

Backbone pretraining

To begin, you will need to pretrain the backbone either on the S3DIS or ScanNet dataset. For consistency and ease of reproduction, we highly recommend using our pretrained backbone weights directly. You can find the pretrained weights and their corresponding download links below:

Model nameDatasetCVFOLDModel Weight
s3_s1preS3DIS1Download link
s3_s0preS3DIS0Download link
sc_s1preScanNet1Download link
sc_s0preScanNet0Download link

Alternatively, you can perform the pretraining on your own. However, please note that doing so may result in more variability compared to the results reported in our paper.

To pretrain the backbone from scratch, run the following command, replacing [PRETRAIN_CONFIG] with the respective configuration file (s3dis_stratified_pretraining.yaml or scannetv2_stratified_pretraining.yaml), [PATH_to_SAVE_BACKBONE] with the desired path to save the backbone, and [CVFOLD] with either 0 or 1 depending on your few-shot setting:

python3 train_backbone.py --config config/[PRETRAIN_CONFIG] save_path [PATH_to_SAVE_BACKBONE] cvfold [CVFOLD]

Few-shot Training

Next, let us start the few-shot training. Set the configs in config/[CONFIG_FILE] (s3dis_COSeg_fs.yaml or scannetv2_COSeg_fs.yaml) for few-shot training. Adjust cvfold, n_way, and k_shot according to your task:

# 1 way 1/5 shot
python3 main_fs.py --config config/[CONFIG_FILE] save_path [PATH_to_SAVE_MODEL] pretrain_backbone [PATH_to_SAVED_BACKBONE] cvfold [CVFOLD] n_way 1 k_shot [K_SHOT] num_episode_per_comb 1000
# 2 way 1/5 shot
python3 main_fs.py --config config/[CONFIG_FILE] save_path [PATH_to_SAVE_MODEL] pretrain_backbone [PATH_to_SAVED_BACKBONE] cvfold [CVFOLD] n_way 2 k_shot [K_SHOT] num_episode_per_comb 100

Note: By default, when n_way=1, num_episode_per_comb is set to 1000. When n_way=2, num_episode_per_comb is adjusted to 100 to maintain consistency in test set magnitude.

Testing

For testing, modify cvfold, n_way, k_shot and num_episode_per_comb accordingly, then run:

python3 main_fs.py --config config/[CONFIG_FILE] test True eval_split test weight [PATH_to_SAVED_MODEL]

For visualization in wandb, you could simply add vis 1.

Note: It is common to observe fluctuations in the mIoU by approximately 1.0%. This variability may be attributed to the relatively small size of the training set. The variance in performance on ScanNetv2 tends to be smaller compared to S3DIS due to its larger size. Additionally, the mean performance across the two dataset splits is generally more stable than the performance of each split individually.

Visualization

To generate the visualizations as in our paper:

  1. Save Predicted Results

    Run the following command to save all related results. You could specify the target class to visualize with target_class according to your own interests. The current code supports visulizations for 1-way 1-shot setting on the S3DIS dataset:

    python3 main_fs.py --config config/[CONFIG_FILE] test True weight [PATH_to_SAVED_MODEL] cvfold [CVFOLD] train_gpu [0] vis_save_path ./vis forvis 1 data_root [PATH_to_DATASET_processed_data]/scenes/data target_class table
    
  2. Render Saved Results

    Use Open3D tools (tested on Open3D==0.16.0) to render the saved results:

    python3 util/visualize.py --targetclass table --vis_path ./vis
    

Since we store labels in the normals attribute as a walkaround for accessing the labels in the PointCloud object, you should press Ctrl+L in the rendering window to disable normals for correct color rendering. Our code allows you to crop the scene, adjust the view, resize, and more in the interactive window. Press Ctrl+P to save the final image when you find a satisfactory perspective. For more details, see the Open3D documentation.

Citation

If you find this project useful, please consider giving a star :star: and citation 📚:

@inproceedings{an2024rethinking,
  title={Rethinking Few-shot 3D Point Cloud Semantic Segmentation},
  author={An, Zhaochong and Sun, Guolei and Liu, Yun and Liu, Fayao and Wu, Zongwei and Wang, Dan and Van Gool, Luc and Belongie, Serge},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={3996--4006},
  year={2024}
}

For any questions or issues, feel free to reach out!

Zhaochong An: anzhaochong@outlook.com

Communication Group (WeChat):

<div style="text-align: left; "> <img src="https://files.mdnice.com/user/67517/0e3b61e3-08aa-4c42-9cfc-21052641cbfd.png" width="200" style="display: inline-block; vertical-align: top;"/> <img src="https://files.mdnice.com/user/67517/068822c4-cece-4ac5-b1db-5c138a91a718.png" width="200" style="display: inline-block; vertical-align: top;"/> </div>