Awesome
Virtual Sparse Convolution for Multimodal 3D Object Detection
This is a official code release of VirConv (Virtual Sparse Convolution for 3D Object Detection). This code is mainly based on OpenPCDet, some codes are from TED, CasA, PENet and SFD.
Detection Framework
- VirConv-L: A light-weight multimodal 3D detector based on Virtual Sparse Convolution.
- VirConv-T: A improved multimodal 3D detector based on Virtual Sparse Convolution and transformed refinement scheme.
- VirConv-S: A semi-supervised VirConv-T based on pseudo labels and fine-tuning.
The detection frameworks are shown below.
Model Zoo
We release three models: VirConv-L, VirConv-T and VirConv-S.
-
The VirConv-L and VirConv-T are trained with train split (3712 samples) of KITTI dataset.
-
The VirConv-S is trained with train split (3712 samples) and unlabeled odometry split (semi split 10888 sample) of KITTI dataset.
-
The results are the 3D AP(R40) of Car on the val set of KITTI dataset.
Important notes:
- The input voxel discard has been changed to input point discard for faster voxelization.
- The convergence of VirConv-T is somewhat unstable ( AP~[89.5,90.3]), if you cannot achieve similar AP, please try multiple times. We recommend VirConv-S, which can achieve 90.5+ AP easily.
- These models are not suitable to directly report results on KITTI test set, please train the models on all or 80% training data and choose a good score threshold to achieve a desirable performance.
Train multiple times on 8xV100 and choose the best:
Environment | Detector | GPU (train) | Easy | Mod. | Hard | download |
---|---|---|---|---|---|---|
Spconv1.2 | VirConv-L | ~7 GB | 93.08 | 88.51 | 86.69 | google / baidu(05u2) / 51M |
Spconv1.2 | VirConv-T | ~13 GB | 94.58 | 89.87 | 87.78 | google / baidu(or81) / 55M |
Spconv1.2 | VirConv-S | ~13 GB | 95.67 | 91.09 | 89.09 | google / baidu(ak74) / 62M |
Train multiple times on 8xV100 and choose the best:
Environment | Detector | GPU (train) | Easy | Mod. | Hard | download |
---|---|---|---|---|---|---|
Spconv2.1 | VirConv-L | ~7 GB | 93.18 | 88.23 | 85.48 | google / baidu(k2dp) / 51M |
Spconv2.1 | VirConv-T | ~13 GB | 94.91 | 90.36 | 88.10 | google / baidu(a4r4) / 56M |
Spconv2.1 | VirConv-S | ~13 GB | 95.76 | 90.91 | 88.61 | google / baidu(j3mi) / 56M |
Getting Started
conda create -n spconv2 python=3.9
conda activate spconv2
pip install torch==1.8.1+cu111 torchvision==0.9.1+cu111 torchaudio==0.8.1 -f https://download.pytorch.org/whl/torch_stable.html
pip install numpy==1.19.5 protobuf==3.19.4 scikit-image==0.19.2 waymo-open-dataset-tf-2-5-0 nuscenes-devkit==1.0.5 spconv-cu111 numba scipy pyyaml easydict fire tqdm shapely matplotlib opencv-python addict pyquaternion awscli open3d pandas future pybind11 tensorboardX tensorboard Cython prefetch-generator
Dependency
Our released implementation is tested on.
- Ubuntu 18.04
- Python 3.6.9
- PyTorch 1.8.1
- Numba 0.53.1
- Spconv 1.2.1
- NVIDIA CUDA 11.1
- 8x Tesla V100 GPUs
We also tested on.
- Ubuntu 18.04
- Python 3.9.13
- PyTorch 1.8.1
- Numba 0.53.1
- Spconv 2.1.22 # pip install spconv-cu111
- NVIDIA CUDA 11.1
- 8x Tesla V100 GPUs
We also tested on.
- Ubuntu 18.04
- Python 3.9.13
- PyTorch 1.8.1
- Numba 0.53.1
- Spconv 2.1.22 # pip install spconv-cu111
- NVIDIA CUDA 11.1
- 2x 3090 GPUs
Prepare dataset
You must creat additional semi
dataset and velodyne_depth
dataset to run our multimodal and semi-supervised detectors.
-
You can download all the preprocessed data from baidu (japc) [74GB], or partial data (not include
semi
due to disk space limit ) from google (13GB). -
Or you can generate the dataset by yourself as follows:
Please download the official KITTI 3D object detection dataset, KITTI odometry dataset and organize the downloaded files as follows (the road planes could be downloaded from [road plane], which are optional for data augmentation in the training):
VirConv
├── data
│ ├── odometry
│ │ │── 00
│ │ │── 01
│ │ │ │── image_2
│ │ │ │── velodyne
│ │ │ │── calib.txt
│ │ │── ...
│ │ │── 21
│ ├── kitti
│ │ │── ImageSets
│ │ │── training
│ │ │ ├──calib & velodyne & label_2 & image_2 & (optional: planes)
│ │ │── testing
│ │ │ ├──calib & velodyne & image_2
├── pcdet
├── tools
(1) Creat semi
dataset from odometry dataset.
cd tools
python3 creat_semi_dataset.py ../data/odometry ../data/kitti/semi
(2) Download the pseudo labels generated by VirConv-T from here (fuse detections from last 10 checkpoints by WBF and filter low quality detections by a 0.9 score threshold) and put it into kitti/semi
.
(3) Download the PENet depth completion model from google (500M) or baidu (gp68), and put it into tools/PENet
.
(4) Then run the following code to generate RGB virtual points.
cd tools/PENet
python3 main.py --detpath ../../data/kitti/training
python3 main.py --detpath ../../data/kitti/testing
python3 main.py --detpath ../../data/kitti/semi
(5) After that, run following command to creat dataset infos:
python3 -m pcdet.datasets.kitti.kitti_dataset_mm create_kitti_infos tools/cfgs/dataset_configs/kitti_dataset.yaml
python3 -m pcdet.datasets.kitti.kitti_datasetsemi create_kitti_infos tools/cfgs/dataset_configs/kitti_dataset.yaml
Anyway, the data structure should be:
VirConv
├── data
│ ├── kitti
│ │ │── ImageSets
│ │ │── training
│ │ │ ├──calib & velodyne & label_2 & image_2 & (optional: planes) & velodyne_depth
│ │ │── testing
│ │ │ ├──calib & velodyne & image_2 & velodyne_depth
│ │ │── semi (optional)
│ │ │ ├──calib & velodyne & label_2(pseudo label) & image_2 & velodyne_depth
│ │ │── gt_database_mm
│ │ │── gt_databasesemi
│ │ │── kitti_dbinfos_trainsemi.pkl
│ │ │── kitti_dbinfos_train_mm.pkl
│ │ │── kitti_infos_test.pkl
│ │ │── kitti_infos_train.pkl
│ │ │── kitti_infos_trainsemi.pkl
│ │ │── kitti_infos_trainval.pkl
│ │ │── kitti_infos_val.pkl
├── pcdet
├── tools
Setup
cd VirConv
python setup.py develop
Training.
For training the VirConv-L and VirConv-T:
Single GPU train:
cd tools
python3 train.py --cfg_file ${CONFIG_FILE}
For example, if you train the VirConv-L model:
cd tools
python3 train.py --cfg_file cfgs/models/kitti/VirConv-L.yaml
Multiple GPU train:
You can modify the gpu number in the dist_train.sh and run
cd tools
sh dist_train.sh
The log infos are saved into log.txt
You can run cat log.txt
to view the training process.
For training the VirConv-S:
You should firstly train a VirConv-T:
cd tools
python3 train.py --cfg_file cfgs/models/kitti/VirConv-T.yaml
Then train the VirConv-S:
cd tools
python3 train.py --cfg_file cfgs/models/kitti/VirConv-S.yaml --pretrained_model ../output/models/kitti/VirConv-T/default/ckpt/checkpoint_epoch_40.pth
Evaluation.
cd tools
python3 test.py --cfg_file ${CONFIG_FILE} --batch_size ${BATCH_SIZE} --ckpt ${CKPT}
For example, if you test the VirConv-S model:
cd tools
python3 test.py --cfg_file cfgs/models/kitti/VirConv-S.yaml --ckpt VirConv-S.pth
Multiple GPU test: you should modify the gpu number in the dist_test.sh and run
sh dist_test.sh
The log infos are saved into log-test.txt
You can run cat log-test.txt
to view the test results.
License
This code is released under the Apache 2.0 license.
Acknowledgement
Citation
@inproceedings{VirConv,
title={Virtual Sparse Convolution for Multimodal 3D Object Detection},
author={Wu, Hai and Wen,Chenglu and Shi, Shaoshuai and Wang, Cheng},
booktitle={CVPR},
year={2023}
}