Awesome
3DTrans: An Open-source Codebase for Continuous Learning towards Autonomous Driving Task
3DTrans
includes Transfer Learning Techniques and Scalable Pre-training Techniques for tackling the continuous learning issue on autonomous driving as follows.
- We implement the Transfer Learning Techniques consisting of four functions:
- Unsupervised Domain Adaptation (UDA) for 3D Point Clouds
- Active Domain Adaptation (ADA) for 3D Point Clouds
- Semi-Supervised Domain Adaptation (SSDA) for 3D Point Clouds
- Multi-dateset Domain Fusion (MDF) for 3D Point Clouds
- We implement the Scalable Pre-training which can continuously enhance the model performance for the downstream tasks, as more pre-training data are fed into our pre-training network:
- AD-PT: Autonomous Driving Pre-Training with Large-scale Point Cloud Dataset
- SPOT: Scalable 3D Pre-training via Occupancy Prediction for Autonomous Driving
Overview
- News
- Installation for 3DTrans
- Getting Started
- Transfer Learning Techniques@3DTrans
- Scalable Pre-training Techniques@3DTrans
- Visualization Tools for 3DTrans
- 3DTrans Framework Introduction
- Acknowledge
- Citation
News :fire:
- We have released all codes of AD-PT here, including: 1) pre-training and fine-tuning methods, 2) labeled and pseudo-labeled data, and 3) pre-trained checkpoints for fine-tuning. Please see AD-PT for more technical details (updated on Sep. 2023).
- SPOT shows that occupancy prediction is a promising pre-training method for general and scalable 3D representation learning, and see Figure 1 of SPOT paper for the inspiring experiment results (updated on Sep. 2023).
- We have released the Reconstruction-Simulation Dataset obtained using the ReSimAD method (updated on Sep. 2023).
- We have released the AD-PT pre-trained checkpoints, see AD-PT pre-trained checkpoints for pre-trained checkpoints (updated on Aug. 2023).
- Based on
3DTrans
, we achieved significant performance gains on a series of downstream perception benchmarks including Waymo, nuScenes, and KITTI, under different baseline models like PV-RCNN++, SECOND, CenterPoint, PV-RCNN (updated on Jun. 2023). - Our
3DTrans
supported the Semi-Supervised Domain Adaptation (SSDA) for 3D Object Detection (updated on Nov. 2022). - Our
3DTrans
supported the Active Domain Adaptation (ADA) of 3D Object Detection for achieving a good trade-off between high performance and annotation cost (updated on Oct. 2022). - Our
3DTrans
supported several typical transfer learning techniques (such as TQS, CLUE, SN, ST3D, Pseudo-labeling, SESS, and Mean-Teacher) for autonomous driving-related model adaptation and transfer. - Our
3DTrans
supported the Multi-domain Dataset Fusion (MDF) of 3D Object Detection for enabling the existing 3D models to effectively learn from multiple off-the-shelf 3D datasets (updated on Sep. 2022). - Our
3DTrans
supported the Unsupervised Domain Adaptation (UDA) of 3D Object Detection for deploying a well-trained source model to an unlabeled target domain (updated on July 2022). - We calculate the distribution of the object-size for each public AD dataset in object-size statistics
We expect this repository will inspire the research of 3D model generalization since it will push the limits of perceptual performance. :tokyo_tower:
<!-- ### :muscle: TODO List :muscle: - [ ] For ADA module, need to add the sequence-level data selection policy (to meet the requirement of practical annotation process). - [x] Provide experimental findings for the AD-related 3D pre-training (**Our ongoing research**, which currently achieves promising pre-training results towards downstream tasks by exploiting large-scale unlabeled data in ONCE dataset using `3DTrans`). -->Installation for 3DTrans
You may refer to INSTALL.md for the installation of 3DTrans
.
Getting Started
<details> <summary>Getting Started for ALL Settings</summary>-
Please refer to Readme for Datasets to prepare the dataset and convert the data into the 3DTrans format. Besides, 3DTrans supports the reading and writing data from Ceph Petrel-OSS, please refer to Readme for Datasets for more details.
-
Please refer to Readme for UDA for understanding the problem definition of UDA and performing the UDA adaptation process.
-
Please refer to Readme for ADA for understanding the problem definition of ADA and performing the ADA adaptation process.
-
Please refer to Readme for SSDA for understanding the problem definition of SSDA and performing the SSDA adaptation process.
-
Please refer to Readme for MDF for understanding the problem definition of MDF and performing the MDF joint-training process.
-
Please refer to Readme for ReSimAD for ReSimAD implementation.
-
Please refer to Readme for AD-PT Pre-training for starting the journey of 3D perception pre-training using AD-PT.
-
Please refer to Readme for PointContrast Pre-training for 3D perception pre-training using PointContrast.
Model Zoo
We could not provide the Waymo-related pretrained models due to Waymo Dataset License Agreement, but you could easily achieve similar performance by training with the corresponding configs.
Domain Transfer Results
<details> <summary>UDA Results</summary>Here, we report the cross-dataset (Waymo-to-KITTI) adaptation results using the BEV/3D AP performance as the evaluation metric. Please refer to Readme for UDA for experimental results of more cross-domain settings.
- All LiDAR-based models are trained with 4 NVIDIA A100 GPUs and are available for download.
- For Waymo dataset training, we train the model using 20% data.
- The domain adaptation time is measured with 4 NVIDIA A100 GPUs and PyTorch 1.8.1.
- Pre-SN represents that we perform the SN (statistical normalization) operation during the pre-training source-only model stage.
- Post-SN represents that we perform the SN (statistical normalization) operation during the adaptation stage.
training time | Adaptation | Car@R40 | download | |
---|---|---|---|---|
PointPillar | ~7.1 hours | Source-only with SN | 74.98 / 49.31 | - |
PointPillar | ~0.6 hours | Pre-SN | 81.71 / 57.11 | model-57M |
PV-RCNN | ~23 hours | Source-only with SN | 69.92 / 60.17 | - |
PV-RCNN | ~23 hours | Source-only | 74.42 / 40.35 | - |
PV-RCNN | ~3.5 hours | Pre-SN | 84.00 / 74.57 | model-156M |
PV-RCNN | ~1 hours | Post-SN | 84.94 / 75.20 | model-156M |
Voxel R-CNN | ~16 hours | Source-only with SN | 75.83 / 55.50 | - |
Voxel R-CNN | ~16 hours | Source-only | 64.88 / 19.90 | - |
Voxel R-CNN | ~2.5 hours | Pre-SN | 82.56 / 67.32 | model-201M |
Voxel R-CNN | ~2.2 hours | Post-SN | 85.44 / 76.78 | model-201M |
PV-RCNN++ | ~20 hours | Source-only with SN | 67.22 / 56.50 | - |
PV-RCNN++ | ~20 hours | Source-only | 67.68 / 20.82 | - |
PV-RCNN++ | ~2.2 hours | Post-SN | 86.86 / 79.86 | model-193M |
Here, we report the Waymo-to-KITTI adaptation results using the BEV/3D AP performance. Please refer to Readme for ADA for experimental results of more cross-domain settings.
- All LiDAR-based models are trained with 4 NVIDIA A100 GPUs and are available for download.
- For Waymo dataset training, we train the model using 20% data.
- The domain adaptation time is measured with 4 NVIDIA A100 GPUs and PyTorch 1.8.1.
training time | Adaptation | Car@R40 | download | |
---|---|---|---|---|
PV-RCNN | ~23h@4 A100 | Source Only | 67.95 / 27.65 | - |
PV-RCNN | ~1.5h@2 A100 | Bi3D (1% annotation budget) | 87.12 / 78.03 | Model-58M |
PV-RCNN | ~10h@2 A100 | Bi3D (5% annotation budget) | 89.53 / 81.32 | Model-58M |
PV-RCNN | ~1.5h@2 A100 | TQS | 82.00 / 72.04 | Model-58M |
PV-RCNN | ~1.5h@2 A100 | CLUE | 82.13 / 73.14 | Model-50M |
PV-RCNN | ~10h@2 A100 | Bi3D+ST3D | 87.83 / 81.23 | Model-58M |
Voxel R-CNN | ~16h@4 A100 | Source Only | 64.87 / 19.90 | - |
Voxel R-CNN | ~1.5h@2 A100 | Bi3D (1% annotation budget) | 88.09 / 79.14 | Model-72M |
Voxel R-CNN | ~6h@2 A100 | Bi3D (5% annotation budget) | 90.18 / 81.34 | Model-72M |
Voxel R-CNN | ~1.5h@2 A100 | TQS | 78.26 / 67.11 | Model-72M |
Voxel R-CNN | ~1.5h@2 A100 | CLUE | 81.93 / 70.89 | Model-72M |
We report the target domain results on Waymo-to-nuScenes adaptation using the BEV/3D AP performance as the evaluation metric, and Waymo-to-ONCE adaptation using ONCE evaluation metric. Please refer to Readme for SSDA for experimental results of more cross-domain settings.
- The domain adaptation time is measured with 4 NVIDIA A100 GPUs and PyTorch 1.8.1.
- For Waymo dataset training, we train the model using 20% data.
- second_5%_FT denotes that we use 5% nuScenes training data to fine-tune the Second model.
- second_5%_SESS denotes that we utilize the SESS: Self-Ensembling Semi-Supervised method to adapt our baseline model.
- second_5%_PS denotes that we fine-tune the source-only model to nuScenes datasets using 5% labeled data, and perform the pseudo-labeling process on the remaining 95% unlabeled nuScenes data.
training time | Adaptation | Car@R40 | download | |
---|---|---|---|---|
Second | ~11 hours | source-only(Waymo) | 27.85 / 16.43 | - |
Second | ~0.4 hours | second_5%_FT | 45.95 / 26.98 | model-61M |
Second | ~1.8 hours | second_5%_SESS | 47.77 / 28.74 | model-61M |
Second | ~1.7 hours | second_5%_PS | 47.72 / 29.37 | model-61M |
PV-RCNN | ~24 hours | source-only(Waymo) | 40.31 / 23.32 | - |
PV-RCNN | ~1.0 hours | pvrcnn_5%_FT | 49.58 / 34.86 | model-150M |
PV-RCNN | ~5.5 hours | pvrcnn_5%_SESS | 49.92 / 35.28 | model-150M |
PV-RCNN | ~5.4 hours | pvrcnn_5%_PS | 49.84 / 35.07 | model-150M |
PV-RCNN++ | ~16 hours | source-only(Waymo) | 31.96 / 19.81 | - |
PV-RCNN++ | ~1.2 hours | pvplus_5%_FT | 49.94 / 34.28 | model-185M |
PV-RCNN++ | ~4.2 hours | pvplus_5%_SESS | 51.14 / 35.25 | model-185M |
PV-RCNN++ | ~3.6 hours | pvplus_5%_PS | 50.84 / 35.39 | model-185M |
- For Waymo-to-ONCE adaptation, we employ 8 NVIDIA A100 GPUs for model training.
- PS denotes that we pseudo-label the unlabeled ONCE and re-train the model on pseudo-labeled data.
- SESS denotes that we utilize the SESS method to adapt the baseline.
- For ONCE, the IoU thresholds for evaluation are 0.7, 0.3, 0.5 for Vehicle, Pedestrian, Cyclist.
Training ONCE Data | Methods | Vehicle@AP | Pedestrian@AP | Cyclist@AP | download | |
---|---|---|---|---|---|---|
Centerpoint | Labeled (4K) | Train from scracth | 74.93 | 46.21 | 67.36 | model-96M |
Centerpoint_Pede | Labeled (4K) | PS | - | 49.14 | - | model-96M |
PV-RCNN++ | Labeled (4K) | Train from scracth | 79.78 | 35.91 | 63.18 | model-188M |
PV-RCNN++ | Small Dataset (100K) | SESS | 80.02 | 46.24 | 66.41 | model-188M |
Here, we report the Waymo-and-nuScenes consolidation results. The models are jointly trained on Waymo and nuScenes datasets, and evaluated on Waymo using the mAP/mAHPH LEVEL_2 and nuScenes using the BEV/3D AP. Please refer to Readme for MDF for more results.
- All LiDAR-based models are trained with 8 NVIDIA A100 GPUs and are available for download.
- The multi-domain dataset fusion (MDF) training time is measured with 8 NVIDIA A100 GPUs and PyTorch 1.8.1.
- For Waymo dataset training, we train the model using 20% training data for saving training time.
- PV-RCNN-nuScenes represents that we train the PV-RCNN model only using nuScenes dataset, and PV-RCNN-DM indicates that we merge the Waymo and nuScenes datasets and train on the merged dataset. Besides, PV-RCNN-DT denotes the domain attention-aware multi-dataset training.
Baseline | MDF Methods | Waymo@Vehicle | Waymo@Pedestrian | Waymo@Cyclist | nuScenes@Car | nuScenes@Pedestrian | nuScenes@Cyclist |
---|---|---|---|---|---|---|---|
PV-RCNN-nuScenes | only nuScenes | 35.59 / 35.21 | 3.95 / 2.55 | 0.94 / 0.92 | 57.78 / 41.10 | 24.52 / 18.56 | 10.24 / 8.25 |
PV-RCNN-Waymo | only Waymo | 66.49 / 66.01 | 64.09 / 58.06 | 62.09 / 61.02 | 32.99 / 17.55 | 3.34 / 1.94 | 0.02 / 0.01 |
PV-RCNN-DM | Direct Merging | 57.82 / 57.40 | 48.24 / 42.81 | 54.63 / 53.64 | 48.67 / 30.43 | 12.66 / 8.12 | 1.67 / 1.04 |
PV-RCNN-Uni3D | Uni3D | 66.98 / 66.50 | 65.70 / 59.14 | 61.49 / 60.43 | 60.77 / 42.66 | 27.44 / 21.85 | 13.50 / 11.87 |
PV-RCNN-DT | Domain Attention | 67.27 / 66.77 | 65.86 / 59.38 | 61.38 / 60.34 | 60.83 / 43.03 | 27.46 / 22.06 | 13.82 / 11.52 |
Baseline | MDF Methods | Waymo@Vehicle | Waymo@Pedestrian | Waymo@Cyclist | nuScenes@Car | nuScenes@Pedestrian | nuScenes@Cyclist |
---|---|---|---|---|---|---|---|
Voxel-RCNN-nuScenes | only nuScenes | 31.89 / 31.65 | 3.74 / 2.57 | 2.41 / 2.37 | 53.63 / 39.05 | 22.48 / 17.85 | 10.86 / 9.70 |
Voxel-RCNN-Waymo | only Waymo | 67.05 / 66.41 | 66.75 / 60.83 | 63.13 / 62.15 | 34.10 / 17.31 | 2.99 / 1.69 | 0.05 / 0.01 |
Voxel-RCNN-DM | Direct Merging | 58.26 / 57.87 | 52.72 / 47.11 | 50.26 / 49.50 | 51.40 / 31.68 | 15.04 / 9.99 | 5.40 / 3.87 |
Voxel-RCNN-Uni3D | Uni3D | 66.76 / 66.29 | 66.62 / 60.51 | 63.36 / 62.42 | 60.18 / 42.23 | 30.08 / 24.37 | 14.60 / 12.32 |
Voxel-RCNN-DT | Domain Attention | 66.96 / 66.50 | 68.23 / 62.00 | 62.57 / 61.64 | 60.42 / 42.81 | 30.49 / 24.92 | 15.91 / 13.35 |
Baseline | MDF Methods | Waymo@Vehicle | Waymo@Pedestrian | Waymo@Cyclist | nuScenes@Car | nuScenes@Pedestrian | nuScenes@Cyclist |
---|---|---|---|---|---|---|---|
PV-RCNN++ DM | Direct Merging | 63.79 / 63.38 | 55.03 / 49.75 | 59.88 / 58.99 | 50.91 / 31.46 | 17.07 / 12.15 | 3.10 / 2.20 |
PV-RCNN++-Uni3D | Uni3D | 68.55 / 68.08 | 69.83 / 63.60 | 64.90 / 63.91 | 62.51 / 44.16 | 33.82 / 27.18 | 22.48 / 19.30 |
PV-RCNN++-DT | Domain Attention | 68.51 / 68.05 | 69.81 / 63.58 | 64.39 / 63.43 | 62.33 / 44.16 | 33.44 / 26.94 | 21.64 / 18.52 |
3D Pre-training Results
<details> <summary>AD-PT Results on Waymo</summary> <!-- Based on our research progress on the cross-domain adaptation of multiple autonomous driving datasets, we can utilize the **multi-source datasets** for performing the pre-training task. Here, we present several unsupervised and self-supervised pre-training implementations (including [PointContrast](https://arxiv.org/abs/2007.10985)). -->AD-PT demonstrates strong generalization learning ability on 3D points. We first pre-train the 3D backbone and 2D backbone using the AD-PT on ONCE dataset (from 100K to 1M data), and fine-tune the model on different datasets. Here, we report the results of fine-tuning on Waymo.
Data amount | Overall | Vehicle | Pedestrian | Cyclist | |
---|---|---|---|---|---|
SECOND (From scratch) | 3% | 52.00 / 37.70 | 58.11 / 57.44 | 51.34 / 27.38 | 46.57 / 28.28 |
SECOND (AD-PT) | 3% | 55.41 / 51.78 | 60.53 / 59.93 | 54.91 / 45.78 | 50.79 / 49.65 |
SECOND (From scratch) | 20% | 60.62 / 56.86 | 64.26 / 63.73 | 59.72 / 50.38 | 57.87 / 56.48 |
SECOND (AD-PT) | 20% | 61.26 / 57.69 | 64.54 / 64.00 | 60.25 / 51.21 | 59.00 / 57.86 |
CenterPoint (From scratch) | 3% | 59.00 / 56.29 | 57.12 / 56.57 | 58.66 / 52.44 | 61.24 / 59.89 |
CenterPoint (AD-PT) | 3% | 61.21 / 58.46 | 60.35 / 59.79 | 60.57 / 54.02 | 62.73 / 61.57 |
CenterPoint (From scratch) | 20% | 66.47 / 64.01 | 64.91 / 64.42 | 66.03 / 60.34 | 68.49 / 67.28 |
CenterPoint (AD-PT) | 20% | 67.17 / 64.65 | 65.33 / 64.83 | 67.16 / 61.20 | 69.39 / 68.25 |
PV-RCNN++ (From scratch) | 3% | 63.81 / 61.10 | 64.42 / 63.93 | 64.33 / 57.79 | 62.69 / 61.59 |
PV-RCNN++ (AD-PT) | 3% | 68.33 / 65.69 | 68.17 / 67.70 | 68.82 / 62.39 | 68.00 / 67.00 |
PV-RCNN++ (From scratch) | 20% | 69.97 / 67.58 | 69.18 / 68.75 | 70.88 / 65.21 | 69.84 / 68.77 |
PV-RCNN++ (AD-PT) | 20% | 71.55 / 69.23 | 70.62 / 70.19 | 72.36 / 66.82 | 71.69 / 70.70 |
ReSimAD
<details> <summary>ReSimAD Implementation</summary>Here, we give the Download Link of our reconstruction-simulation dataset by the ReSimAD, consisting of nuScenes-like, KITTI-like, ONCE-like, and Waymo-like datasets that generate target-domain-like simulation points.
Specifically, please refer to ReSimAD reconstruction for the point-based reconstruction meshes, and PCSim for the technical details of simulating the target-domain-like points based on the reconstructed meshes. For perception module, please refer to PV-RCNN and PV-RCNN++ for model training and evaluation.
We report the zero-shot cross-dataset (Waymo-to-nuScenes) adaptation results using the BEV/3D AP performance as the evaluation metric for a fair comparison. Please refer to ReSimAD for more details.
Methods | training time | Adaptation | Car@R40 | Ckpt |
---|---|---|---|---|
PV-RCNN | ~23 hours | Source-only | 31.02 / 17.75 | Not Avaliable (Waymo License) |
PV-RCNN | ~8 hours | ST3D | 36.42 / 22.99 | - |
PV-RCNN | ~8 hours | ReSimAD | 37.85 / 21.33 | ReSimAD_ckpt |
PV-RCNN++ | ~20 hours | Source-only | 29.93 / 18.77 | Not Avaliable (Waymo License) |
PV-RCNN++ | ~2.2 hours | ST3D | 34.68 / 17.17 | - |
PV-RCNN++ | ~8 hours | ReSimAD | 40.73 / 23.72 | ReSimAD_ckpt |
Visualization Tools for 3DTrans
- Our
3DTrans
supports the sequence-level visualization function Quick Sequence Demo to continuously display the prediction results of ground truth of a selected scene.
Acknowledge
- Our code is heavily based on OpenPCDet v0.5.2. Thanks OpenPCDet Development Team for their awesome codebase.
- A Team Home for Member Information and Profile, Project Link
Technical Papers
@inproceedings{zhang2023uni3d,
title={Uni3D: A Unified Baseline for Multi-dataset 3D Object Detection},
author={Zhang, Bo and Yuan, Jiakang and Shi, Botian and Chen, Tao and Li, Yikang and Qiao, Yu},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={9253--9262},
year={2023}
}
@inproceedings{yuan2023bi3d,
title={Bi3D: Bi-domain Active Learning for Cross-domain 3D Object Detection},
author={Yuan, Jiakang and Zhang, Bo and Yan, Xiangchao and Chen, Tao and Shi, Botian and Li, Yikang and Qiao, Yu},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={15599--15608},
year={2023}
}
@inproceedings{yuan2023AD-PT,
title={AD-PT: Autonomous Driving Pre-Training with Large-scale Point Cloud Dataset},
author={Yuan, Jiakang and Zhang, Bo and Yan, Xiangchao and Chen, Tao and Shi, Botian and Li, Yikang and Qiao, Yu},
booktitle={Advances in Neural Information Processing Systems},
year={2023}
}
@inproceedings{huang2023sug,
title={SUG: Single-dataset Unified Generalization for 3D Point Cloud Classification},
author={Huang, Siyuan and Zhang, Bo and Shi, Botian and Gao, Peng and Li, Yikang and Li, Hongsheng},
booktitle={Proceedings of the 31th ACM International Conference on Multimedia},
year={2023}
}
@inproceedings{zhang2023resimad,
title={ReSimAD: Zero-Shot 3D Domain Transfer for Autonomous Driving with Source Reconstruction and Target Simulation},
author={Zhang, Bo and Cai, Xinyu and Yuan, Jiakang and Yang, Donglin and Guo, Jianfei and Xia, Renqiu and Shi, Botian and Dou, Min and Chen, Tao and Liu, Si and others},
journal={International Conference on Learning Representations},
year={2024}
}
@article{yan2023spot,
title={SPOT: Scalable 3D Pre-training via Occupancy Prediction for Autonomous Driving},
author={Yan, Xiangchao and Chen, Runjian and Zhang, Bo and Yuan, Jiakang and Cai, Xinyu and Shi, Botian and Shao, Wenqi and Yan, Junchi and Luo, Ping and Qiao, Yu},
journal={arXiv preprint arXiv:2309.10527},
year={2023}
}