Home

Awesome

arXiv arXiv arXiv arXiv arXiv GitHub issues PRs Welcome

3DTrans: An Open-source Codebase for Continuous Learning towards Autonomous Driving Task

3DTrans includes Transfer Learning Techniques and Scalable Pre-training Techniques for tackling the continuous learning issue on autonomous driving as follows.

  1. We implement the Transfer Learning Techniques consisting of four functions:
  1. We implement the Scalable Pre-training which can continuously enhance the model performance for the downstream tasks, as more pre-training data are fed into our pre-training network:
<!-- **This project is developed and maintained by Autonomous Driving Group [at] [Shanghai AI Laboratory](https://www.shlab.org.cn/) (ADLab).** -->

Overview

News :fire:

<!-- :rocket: We are actively updating this repository currently, and more **cross-dataset fusion solutions** (including domain attention and mixture-of-experts) and more **low-cost data sampling strategy** will be supported by 3DTrans in the future, which aims to boost the generalization ability and adaptability of the existing state-of-the-art models. :rocket: -->

We expect this repository will inspire the research of 3D model generalization since it will push the limits of perceptual performance. :tokyo_tower:

<!-- ### :muscle: TODO List :muscle: - [ ] For ADA module, need to add the sequence-level data selection policy (to meet the requirement of practical annotation process). - [x] Provide experimental findings for the AD-related 3D pre-training (**Our ongoing research**, which currently achieves promising pre-training results towards downstream tasks by exploiting large-scale unlabeled data in ONCE dataset using `3DTrans`). -->

Installation for 3DTrans

You may refer to INSTALL.md for the installation of 3DTrans.

Getting Started

<details> <summary>Getting Started for ALL Settings</summary> </details>

Model Zoo

We could not provide the Waymo-related pretrained models due to Waymo Dataset License Agreement, but you could easily achieve similar performance by training with the corresponding configs.

Domain Transfer Results

<details> <summary>UDA Results</summary>

Here, we report the cross-dataset (Waymo-to-KITTI) adaptation results using the BEV/3D AP performance as the evaluation metric. Please refer to Readme for UDA for experimental results of more cross-domain settings.

training timeAdaptationCar@R40download
PointPillar~7.1 hoursSource-only with SN74.98 / 49.31-
PointPillar~0.6 hoursPre-SN81.71 / 57.11model-57M
PV-RCNN~23 hoursSource-only with SN69.92 / 60.17-
PV-RCNN~23 hoursSource-only74.42 / 40.35-
PV-RCNN~3.5 hoursPre-SN84.00 / 74.57model-156M
PV-RCNN~1 hoursPost-SN84.94 / 75.20model-156M
Voxel R-CNN~16 hoursSource-only with SN75.83 / 55.50-
Voxel R-CNN~16 hoursSource-only64.88 / 19.90-
Voxel R-CNN~2.5 hoursPre-SN82.56 / 67.32model-201M
Voxel R-CNN~2.2 hoursPost-SN85.44 / 76.78model-201M
PV-RCNN++~20 hoursSource-only with SN67.22 / 56.50-
PV-RCNN++~20 hoursSource-only67.68 / 20.82-
PV-RCNN++~2.2 hoursPost-SN86.86 / 79.86model-193M
</details> <details> <summary>ADA Results</summary>

Here, we report the Waymo-to-KITTI adaptation results using the BEV/3D AP performance. Please refer to Readme for ADA for experimental results of more cross-domain settings.

training timeAdaptationCar@R40download
PV-RCNN~23h@4 A100Source Only67.95 / 27.65-
PV-RCNN~1.5h@2 A100Bi3D (1% annotation budget)87.12 / 78.03Model-58M
PV-RCNN~10h@2 A100Bi3D (5% annotation budget)89.53 / 81.32Model-58M
PV-RCNN~1.5h@2 A100TQS82.00 / 72.04Model-58M
PV-RCNN~1.5h@2 A100CLUE82.13 / 73.14Model-50M
PV-RCNN~10h@2 A100Bi3D+ST3D87.83 / 81.23Model-58M
Voxel R-CNN~16h@4 A100Source Only64.87 / 19.90-
Voxel R-CNN~1.5h@2 A100Bi3D (1% annotation budget)88.09 / 79.14Model-72M
Voxel R-CNN~6h@2 A100Bi3D (5% annotation budget)90.18 / 81.34Model-72M
Voxel R-CNN~1.5h@2 A100TQS78.26 / 67.11Model-72M
Voxel R-CNN~1.5h@2 A100CLUE81.93 / 70.89Model-72M
</details> <details> <summary>SSDA Results</summary>

We report the target domain results on Waymo-to-nuScenes adaptation using the BEV/3D AP performance as the evaluation metric, and Waymo-to-ONCE adaptation using ONCE evaluation metric. Please refer to Readme for SSDA for experimental results of more cross-domain settings.

training timeAdaptationCar@R40download
Second~11 hourssource-only(Waymo)27.85 / 16.43-
Second~0.4 hourssecond_5%_FT45.95 / 26.98model-61M
Second~1.8 hourssecond_5%_SESS47.77 / 28.74model-61M
Second~1.7 hourssecond_5%_PS47.72 / 29.37model-61M
PV-RCNN~24 hourssource-only(Waymo)40.31 / 23.32-
PV-RCNN~1.0 hourspvrcnn_5%_FT49.58 / 34.86model-150M
PV-RCNN~5.5 hourspvrcnn_5%_SESS49.92 / 35.28model-150M
PV-RCNN~5.4 hourspvrcnn_5%_PS49.84 / 35.07model-150M
PV-RCNN++~16 hourssource-only(Waymo)31.96 / 19.81-
PV-RCNN++~1.2 hourspvplus_5%_FT49.94 / 34.28model-185M
PV-RCNN++~4.2 hourspvplus_5%_SESS51.14 / 35.25model-185M
PV-RCNN++~3.6 hourspvplus_5%_PS50.84 / 35.39model-185M
Training ONCE DataMethodsVehicle@APPedestrian@APCyclist@APdownload
CenterpointLabeled (4K)Train from scracth74.9346.2167.36model-96M
Centerpoint_PedeLabeled (4K)PS-49.14-model-96M
PV-RCNN++Labeled (4K)Train from scracth79.7835.9163.18model-188M
PV-RCNN++Small Dataset (100K)SESS80.0246.2466.41model-188M
</details> <details> <summary>MDF Results</summary>

Here, we report the Waymo-and-nuScenes consolidation results. The models are jointly trained on Waymo and nuScenes datasets, and evaluated on Waymo using the mAP/mAHPH LEVEL_2 and nuScenes using the BEV/3D AP. Please refer to Readme for MDF for more results.

BaselineMDF MethodsWaymo@VehicleWaymo@PedestrianWaymo@CyclistnuScenes@CarnuScenes@PedestriannuScenes@Cyclist
PV-RCNN-nuScenesonly nuScenes35.59 / 35.213.95 / 2.550.94 / 0.9257.78 / 41.1024.52 / 18.5610.24 / 8.25
PV-RCNN-Waymoonly Waymo66.49 / 66.0164.09 / 58.0662.09 / 61.0232.99 / 17.553.34 / 1.940.02 / 0.01
PV-RCNN-DMDirect Merging57.82 / 57.4048.24 / 42.8154.63 / 53.6448.67 / 30.4312.66 / 8.121.67 / 1.04
PV-RCNN-Uni3DUni3D66.98 / 66.5065.70 / 59.1461.49 / 60.4360.77 / 42.6627.44 / 21.8513.50 / 11.87
PV-RCNN-DTDomain Attention67.27 / 66.7765.86 / 59.3861.38 / 60.3460.83 / 43.0327.46 / 22.0613.82 / 11.52
BaselineMDF MethodsWaymo@VehicleWaymo@PedestrianWaymo@CyclistnuScenes@CarnuScenes@PedestriannuScenes@Cyclist
Voxel-RCNN-nuScenesonly nuScenes31.89 / 31.653.74 / 2.572.41 / 2.3753.63 / 39.0522.48 / 17.8510.86 / 9.70
Voxel-RCNN-Waymoonly Waymo67.05 / 66.4166.75 / 60.8363.13 / 62.1534.10 / 17.312.99 / 1.690.05 / 0.01
Voxel-RCNN-DMDirect Merging58.26 / 57.8752.72 / 47.1150.26 / 49.5051.40 / 31.6815.04 / 9.995.40 / 3.87
Voxel-RCNN-Uni3DUni3D66.76 / 66.2966.62 / 60.5163.36 / 62.4260.18 / 42.2330.08 / 24.3714.60 / 12.32
Voxel-RCNN-DTDomain Attention66.96 / 66.5068.23 / 62.0062.57 / 61.6460.42 / 42.8130.49 / 24.9215.91 / 13.35
BaselineMDF MethodsWaymo@VehicleWaymo@PedestrianWaymo@CyclistnuScenes@CarnuScenes@PedestriannuScenes@Cyclist
PV-RCNN++ DMDirect Merging63.79 / 63.3855.03 / 49.7559.88 / 58.9950.91 / 31.4617.07 / 12.153.10 / 2.20
PV-RCNN++-Uni3DUni3D68.55 / 68.0869.83 / 63.6064.90 / 63.9162.51 / 44.1633.82 / 27.1822.48 / 19.30
PV-RCNN++-DTDomain Attention68.51 / 68.0569.81 / 63.5864.39 / 63.4362.33 / 44.1633.44 / 26.9421.64 / 18.52
</details>

3D Pre-training Results

<details> <summary>AD-PT Results on Waymo</summary> <!-- Based on our research progress on the cross-domain adaptation of multiple autonomous driving datasets, we can utilize the **multi-source datasets** for performing the pre-training task. Here, we present several unsupervised and self-supervised pre-training implementations (including [PointContrast](https://arxiv.org/abs/2007.10985)). -->

AD-PT demonstrates strong generalization learning ability on 3D points. We first pre-train the 3D backbone and 2D backbone using the AD-PT on ONCE dataset (from 100K to 1M data), and fine-tune the model on different datasets. Here, we report the results of fine-tuning on Waymo.

Data amountOverallVehiclePedestrianCyclist
SECOND (From scratch)3%52.00 / 37.7058.11 / 57.4451.34 / 27.3846.57 / 28.28
SECOND (AD-PT)3%55.41 / 51.7860.53 / 59.9354.91 / 45.7850.79 / 49.65
SECOND (From scratch)20%60.62 / 56.8664.26 / 63.7359.72 / 50.3857.87 / 56.48
SECOND (AD-PT)20%61.26 / 57.6964.54 / 64.0060.25 / 51.2159.00 / 57.86
CenterPoint (From scratch)3%59.00 / 56.2957.12 / 56.5758.66 / 52.4461.24 / 59.89
CenterPoint (AD-PT)3%61.21 / 58.4660.35 / 59.7960.57 / 54.0262.73 / 61.57
CenterPoint (From scratch)20%66.47 / 64.0164.91 / 64.4266.03 / 60.3468.49 / 67.28
CenterPoint (AD-PT)20%67.17 / 64.6565.33 / 64.8367.16 / 61.2069.39 / 68.25
PV-RCNN++ (From scratch)3%63.81 / 61.1064.42 / 63.9364.33 / 57.7962.69 / 61.59
PV-RCNN++ (AD-PT)3%68.33 / 65.6968.17 / 67.7068.82 / 62.3968.00 / 67.00
PV-RCNN++ (From scratch)20%69.97 / 67.5869.18 / 68.7570.88 / 65.2169.84 / 68.77
PV-RCNN++ (AD-PT)20%71.55 / 69.2370.62 / 70.1972.36 / 66.8271.69 / 70.70
</details>

ReSimAD

<details> <summary>ReSimAD Implementation</summary>

Here, we give the Download Link of our reconstruction-simulation dataset by the ReSimAD, consisting of nuScenes-like, KITTI-like, ONCE-like, and Waymo-like datasets that generate target-domain-like simulation points.

Specifically, please refer to ReSimAD reconstruction for the point-based reconstruction meshes, and PCSim for the technical details of simulating the target-domain-like points based on the reconstructed meshes. For perception module, please refer to PV-RCNN and PV-RCNN++ for model training and evaluation.

We report the zero-shot cross-dataset (Waymo-to-nuScenes) adaptation results using the BEV/3D AP performance as the evaluation metric for a fair comparison. Please refer to ReSimAD for more details.

Methodstraining timeAdaptationCar@R40Ckpt
PV-RCNN~23 hoursSource-only31.02 / 17.75Not Avaliable (Waymo License)
PV-RCNN~8 hoursST3D36.42 / 22.99-
PV-RCNN~8 hoursReSimAD37.85 / 21.33ReSimAD_ckpt
PV-RCNN++~20 hoursSource-only29.93 / 18.77Not Avaliable (Waymo License)
PV-RCNN++~2.2 hoursST3D34.68 / 17.17-
PV-RCNN++~8 hoursReSimAD40.73 / 23.72ReSimAD_ckpt
</details>

Visualization Tools for 3DTrans

<details> <summary>Visualization Demo</summary> </details>

Acknowledge

<!-- * Our pre-training 3D point cloud task is based on [ONCE Dataset](https://once-for-auto-driving.github.io/). Thanks ONCE Development Team for their inspiring data release. -->

Technical Papers

@inproceedings{zhang2023uni3d,
  title={Uni3D: A Unified Baseline for Multi-dataset 3D Object Detection},
  author={Zhang, Bo and Yuan, Jiakang and Shi, Botian and Chen, Tao and Li, Yikang and Qiao, Yu},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={9253--9262},
  year={2023}
}
@inproceedings{yuan2023bi3d,
  title={Bi3D: Bi-domain Active Learning for Cross-domain 3D Object Detection},
  author={Yuan, Jiakang and Zhang, Bo and Yan, Xiangchao and Chen, Tao and Shi, Botian and Li, Yikang and Qiao, Yu},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={15599--15608},
  year={2023}
}
@inproceedings{yuan2023AD-PT,
  title={AD-PT: Autonomous Driving Pre-Training with Large-scale Point Cloud Dataset},
  author={Yuan, Jiakang and Zhang, Bo and Yan, Xiangchao and Chen, Tao and Shi, Botian and Li, Yikang and Qiao, Yu},
  booktitle={Advances in Neural Information Processing Systems},
  year={2023}
}
@inproceedings{huang2023sug,
  title={SUG: Single-dataset Unified Generalization for 3D Point Cloud Classification},
  author={Huang, Siyuan and Zhang, Bo and Shi, Botian and Gao, Peng and Li, Yikang and Li, Hongsheng},
  booktitle={Proceedings of the 31th ACM International Conference on Multimedia},
  year={2023}
}
@inproceedings{zhang2023resimad,
  title={ReSimAD: Zero-Shot 3D Domain Transfer for Autonomous Driving with Source Reconstruction and Target Simulation},
  author={Zhang, Bo and Cai, Xinyu and Yuan, Jiakang and Yang, Donglin and Guo, Jianfei and Xia, Renqiu and Shi, Botian and Dou, Min and Chen, Tao and Liu, Si and others},
  journal={International Conference on Learning Representations},
  year={2024}
}
@article{yan2023spot,
  title={SPOT: Scalable 3D Pre-training via Occupancy Prediction for Autonomous Driving},
  author={Yan, Xiangchao and Chen, Runjian and Zhang, Bo and Yuan, Jiakang and Cai, Xinyu and Shi, Botian and Shao, Wenqi and Yan, Junchi and Luo, Ping and Qiao, Yu},
  journal={arXiv preprint arXiv:2309.10527},
  year={2023}
}