Awesome
Focal Sparse Convolutional Networks for 3D Object Detection (CVPR 2022, Oral)
This is the official implementation of Focals Conv (CVPR 2022), a new sparse convolution design for 3D object detection (feasible for both lidar-only and multi-modal settings). For more details, please refer to:
Focal Sparse Convolutional Networks for 3D Object Detection [Paper] <br /> Yukang Chen, Yanwei Li, Xiangyu Zhang, Jian Sun, Jiaya Jia<br />
<p align="center"> <img src="docs/imgs/FocalSparseConv23D.png" width="100%"> </p> <p align="center"> <img src="docs/imgs/FocalSparseConv_Pipeline.png" width="100%"> </p>News
- [2023-01-05] The CUDA version of Focals Conv is released in spconv-plus, including some other sparse operators. The example for using it can be found here.
- [2022-08-24] The code and example for test-time augmentations have been released here.
- [2022-07-05] The code for Focals Conv has been marged into the official codebase OpenPCDet.
- [2022-06-21] The other 3D backbone network design is presented LargeKernel3D [Paper | Github].
Experimental results
KITTI dataset
Car@R11 | Car@R40 | download | |
---|---|---|---|
PV-RCNN + Focals Conv | 83.91 | 85.20 | Google | Baidu (key: m15b) |
PV-RCNN + Focals Conv (multimodal) | 84.58 | 85.34 | Google | Baidu (key: ie6n) |
Voxel R-CNN (Car) + Focals Conv (multimodal) | 85.68 | 86.00 | Google | Baidu (key: tnw9) |
nuScenes dataset
mAP | NDS | download | |
---|---|---|---|
CenterPoint + Focals Conv (multi-modal) | 63.86 | 69.41 | Google | Baidu (key: 01jh) |
CenterPoint + Focals Conv (multi-modal) - 1/4 data | 62.15 | 67.45 | Google | Baidu (key: 6qsc) |
Visualization of voxel distribution of Focals Conv on KITTI val dataset:
<p align="center"> <img src="docs/imgs/Sparsity_comparison_3pairs.png" width="100%"> </p>Getting Started
Installation
a. Clone this repository
https://github.com/dvlab-research/FocalsConv && cd FocalsConv
b. Install the environment
Following the install documents for OpenPCdet and CenterPoint codebases respectively, based on your preference.
*spconv 2.x is highly recommended instead of spconv 1.x version.
c. Prepare the datasets.
Download and organize the official KITTI and Waymo following the document in OpenPCdet, and nuScenes from the CenterPoint codebase.
*Note that for nuScenes dataset, we use image-level gt-sampling (copy-paste) in the multi-modal training. Please download this dbinfos_train_10sweeps_withvelo.pkl to replace the original one. (Google | Baidu (key: b466))
*Note that for nuScenes dataset, we conduct ablation studies on a 1/4 data training split. Please download infos_train_mini_1_4_10sweeps_withvelo_filter_True.pkl if you needed for training. (Google | Baidu (key: 769e))
d. Download pre-trained models.
If you want to directly evaluate the trained models we provide, please download them first.
If you want to train by yourselvef, for multi-modal settings, please download this resnet pre-train model first, torchvision-res50-deeplabv3.
Evaluation
We provide the trained weight file so you can just run with that. You can also use the model you trained.
For models in OpenPCdet,
NUM_GPUS=8
cd tools
bash scripts/dist_test.sh ${NUM_GPUS} --cfg_file cfgs/kitti_models/voxel_rcnn_car_focal_multimodal.yaml --ckpt path/to/voxelrcnn_focal_multimodal.pth
bash scripts/dist_test.sh ${NUM_GPUS} --cfg_file cfgs/kitti_models/pv_rcnn_focal_multimodal.yaml --ckpt ../pvrcnn_focal_multimodal.pth
bash scripts/dist_test.sh ${NUM_GPUS} --cfg_file cfgs/kitti_models/pv_rcnn_focal_lidar.yaml --ckpt path/to/pvrcnn_focal_lidar.pth
For models in CenterPoint,
CONFIG="nusc_centerpoint_voxelnet_0075voxel_fix_bn_z_focal_multimodal"
python -m torch.distributed.launch --nproc_per_node=${NUM_GPUS} ./tools/dist_test.py configs/nusc/voxelnet/$CONFIG.py --work_dir ./work_dirs/$CONFIG --checkpoint centerpoint_focal_multimodal.pth
Training
For configures in OpenPCdet,
bash scripts/dist_train.sh ${NUM_GPUS} --cfg_file cfgs/kitti_models/CONFIG.yaml
For configures in CenterPoint,
python -m torch.distributed.launch --nproc_per_node=${NUM_GPUS} ./tools/train.py configs/nusc/voxelnet/$CONFIG.py --work_dir ./work_dirs/CONFIG
-
Note that we use 8 GPUs to train OpenPCdet models and 4 GPUs to train CenterPoint models.
-
Note that for model size counting of multi-modal model, please refer to this issue.
Citation
If you find this project useful in your research, please consider citing:
@inproceedings{focalsconv-chen,
title={Focal Sparse Convolutional Networks for 3D Object Detection},
author={Chen, Yukang and Li, Yanwei and Zhang, Xiangyu and Sun, Jian and Jia, Jiaya},
booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
year={2022}
}
Acknowledgement
-
This work is built upon the
OpenPCDet
andCenterPoint
. Please refer to the official github repositories, OpenPCDet and CenterPoint for more information. -
This README follows the style of IA-SSD.
License
This project is released under the Apache 2.0 license.