Awesome

Point-to-Voxel Knowledge Distillation for LiDAR Semantic Segmentation (CVPR 2022)

Our model achieves state-of-the-art performance on three challenges, i.e., ranks 1st in Waymo 3D Semantic Segmentation Challenge (the "Cylinder3D" and "Offboard_SemSeg" entries, May 2022), ranks 1st in SemanticKITTI LiDAR Semantic Segmentation Challenge (single-scan, the "Point-Voxel-KD" entry, Jun 2022), ranks 2nd in SemanticKITTI LiDAR Semantic Segmentation Challenge (multi-scan, the "PVKD" entry, Dec 2021). Do not hesitate to use our trained models!

News

2022-11 [NEW:fire:] Some useful training tips have been provided.
2022-11 The distillation codes and some training tips will be released after CVPR DDL.
2022-7 We provide a trained model of CENet, a range-image-based LiDAR segmentation method. The reproduced performance is much higher than the reported value!
2022-6 Our method ranks 1st in SemanticKITTI LiDAR Semantic Segmentation Challenge (single-scan, the "Point-Voxel-KD" entity)

2022-5 Our method ranks 1st in Waymo 3D Semantic Segmentation Challenge (the "Cylinder3D" and "Offboard_SemSeg" entities)

Installation

Requirements

PyTorch >= 1.2
yaml
tqdm
numba
Cython
torch-scatter
nuScenes-devkit (optional for nuScenes)
spconv (tested with spconv==1.2.1 and cuda==10.2)

Data Preparation

SemanticKITTI

./
├── 
├── ...
└── path_to_data_shown_in_config/
    ├──sequences
        ├── 00/           
        │   ├── velodyne/	
        |   |	├── 000000.bin
        |   |	├── 000001.bin
        |   |	└── ...
        │   └── labels/ 
        |       ├── 000000.label
        |       ├── 000001.label
        |       └── ...
        ├── 08/ # for validation
        ├── 11/ # 11-21 for testing
        └── 21/
	    └── ...

nuScenes

./
├── 
├── ...
└── path_to_data_shown_in_config/
		├──v1.0-trainval
		├──v1.0-test
		├──samples
		├──sweeps
		├──maps

Waymo

./
├── 
├── ...
└── path_to_data_shown_in_config/
		├──first_return
		├──second_return

Test

We take evaluation on the SemanticKITTI test set (single-scan) as example.

Download the pre-trained models and put them in ./model_load_dir.
Generate predictions on the SemanticKITTI test set.

CUDA_VISIBLE_DEVICES=0 python -u test_cyl_sem_tta.py

We perform test-time augmentation to boost the performance. The model predictions will be saved in ./out_cyl/test by default.

Convert label number back to the original dataset format before submitting:

python remap_semantic_labels.py -p out_cyl/test -s test --inverse
cd out_cyl/test
zip -r out_cyl.zip sequences/

Upload out_cyl.zip to the SemanticKITTI online server.

Train

CUDA_VISIBLE_DEVICES=0 python -u train_cyl_sem.py

Remember to change the imageset of val_data_loader to val, return_test of dataset_params to False in semantickitti.yaml. Currently, we only support vanilla training.

Useful Training Tips

Finetuning.

You can finetune the model using both train and val sets as well as a smaller learning rate (1/3 or 1/4 of the original learning rate).

Model ensemble.

You can use models of different epochs as an ensemble. Different models can also be taken as an ensemble, e.g., SPVCNN and Cylinder3D.

Semi-supervised learning.

You can follow GuidedContrast to use pseudo labels of the test set to complement the original training set. (DO NOT use it in the supervised training. It can only be used in the semi-supervised setting to prove the value of the proposed semi-supervised algorithm.)

More data augmentations.

You can use LaserMix, Instance Augmentation and PolarMix to increase the diversity of training samples.

Knowledge distillation (KD).

You can refer to CRD to apply KD to boost the performance of LiDAR segmentation models. We will release a more efficient and effective version of the PVKD algorithm soon.

Using more inputs.

In addition to the (x, y, z), you can also use the intensity, range, azimuth, inclination and elongation as additional inputs. Remember to normalize these input signals if necessary. Tanh function is a good normalizer in some cases.

Increasing the model size.

You can either increase the width (more channels) or the depth (more layers) of the model to boost the performance.

Test time augmentation (TTA).

You can use more augmentations (flipping, rotation, scaling, translation) in TTA to boost the performance. A proper combination of them is vital to the final performance.

Performance

Abbreviation:

cyl: Cylinder3D, sem: SemanticKITTI, nusc: nuScenes, ms: multi-scan task, tta: test-time augmentation,

1.5x: channel expansion ratio, 72_4: performance (mIoU), 64x512: resolution of the range image

SemanticKITTI test set (single-scan):

Model	Reported	Reproduced	Gain	Weight
SPVNAS	66.4%	71.4%	5.0%	--
Cylinder3D_1.5x	--	72.4%	--	cyl_sem_1.5x_72_4.pt
Cylinder3D	68.9%	71.8%	2.9%	cyl_sem_1.0x_71_8.pt
Cylinder3D_0.5x	71.2%	71.4%	0.2%	cyl_sem_0.5x_71_4.pt
CENet_1.0x	64.7%	67.6%	2.9%	CENet_64x512_67_6

SemanticKITTI test set (multi-scan):

Model	Reported	Reproduced	Gain	Weight
Cylinder3D	52.5%	--	--	--
Cylinder3D_0.5x	58.2%	58.4%	0.2%	cyl_sem_ms_0.5x_58_4.pt

Waymo test set:

Model	Reported	Reproduced	Gain	Weight
Cylinder3D	71.18%	71.18%	--	--
Cylinder3D_0.5x	--	--	--	--

nuScenes val set:

Model	Reported	Reproduced	Gain	Weight
Cylinder3D	76.1%	--	--	--
Cylinder3D_0.5x	76.0%	76.15%	0.15%	cyl_nusc_0.5x_76_15.pt

Citation

If you use the codes, please consider citing the following publications:

@inproceedings{pvkd,
    title     = {Point-to-Voxel Knowledge Distillation for LiDAR Semantic Segmentation},
    author    = {Hou, Yuenan and Zhu, Xinge and Ma, Yuexin and Loy, Chen Change and Li, Yikang},
    booktitle = {IEEE Conference on Computer Vision and Pattern Recognition},
    pages     = {8479-8488}
    year      = {2022},
}

@inproceedings{cylinder3d,
    title={Cylindrical and Asymmetrical 3D Convolution Networks for LiDAR Segmentation},
    author={Zhu, Xinge and Zhou, Hui and Wang, Tai and Hong, Fangzhou and Ma, Yuexin and Li, Wei and Li, Hongsheng and Lin, Dahua},
    booktitle={IEEE Conference on Computer Vision and Pattern Recognition},
    pages={9939--9948},
    year={2021}
}

@article{cylinder3d-tpami,
    title={Cylindrical and Asymmetrical 3D Convolution Networks for LiDAR-based Perception},
    author={Zhu, Xinge and Zhou, Hui and Wang, Tai and Hong, Fangzhou and Li, Wei and Ma, Yuexin and Li, Hongsheng and Yang, Ruigang and Lin, Dahua},
    journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
    year={2021},
    publisher={IEEE}
}

Acknowledgements

This repo is built upon the awesome Cylinder3D.