Home

Awesome

<p align="right">English | <a href="docs/CN.md">įŽ€äŊ“中文</a></p> <p align="center"> <img src="docs/figs/logo.png" align="center" width="25%"> <h3 align="center"><strong>Benchmarking and Improving Bird's Eye View Perception Robustness</br>in Autonomous Driving</strong></h3> <p align="center"> <a href="https://scholar.google.com/citations?user=s1m55YoAAAAJ" target='_blank'>Shaoyuan Xie</a><sup>1</sup>&nbsp;&nbsp; <a href="https://scholar.google.com/citations?user=-j1j7TkAAAAJ" target='_blank'>Lingdong Kong</a><sup>2,3</sup>&nbsp;&nbsp; <a href="https://scholar.google.com/citations?user=QDXADSEAAAAJ" target='_blank'>Wenwei Zhang</a><sup>2,4</sup>&nbsp;&nbsp; <a href="https://scholar.google.com/citations?user=YUKPVCoAAAAJ" target='_blank'>Jiawei Ren</a><sup>4</sup>&nbsp;&nbsp; <a href="https://scholar.google.com/citations?user=lSDISOcAAAAJ" target='_blank'>Liang Pan</a><sup>2</sup>&nbsp;&nbsp; <a href="https://scholar.google.com/citations?user=eGD0b7IAAAAJ" target='_blank'>Kai Chen</a><sup>2</sup>&nbsp;&nbsp; <a href="https://scholar.google.com/citations?user=lc45xlcAAAAJ" target='_blank'>Ziwei Liu</a><sup>4</sup> <br> <small><sup>1</sup>University of California, Irvine&nbsp;&nbsp;</small> <small><sup>2</sup>Shanghai AI Laboratory&nbsp;&nbsp;</small> <small><sup>3</sup>National University of Singapore&nbsp;&nbsp;</small> <small><sup>4</sup>S-Lab, Nanyang Technological University</small> </p> </p> <p align="center"> <a href="https://arxiv.org/abs/2405.17426" target='_blank'> <img src="https://img.shields.io/badge/Paper-%F0%9F%93%83-blue"> </a> <a href="https://daniel-xsy.github.io/robobev/" target='_blank'> <img src="https://img.shields.io/badge/Project-%F0%9F%94%97-lightblue"> </a> <a href="https://daniel-xsy.github.io/robobev/" target='_blank'> <img src="https://img.shields.io/badge/Demo-%F0%9F%8E%AC-yellow"> </a> <a href="docs/CN.md" target='_blank'> <img src="https://img.shields.io/badge/%E4%B8%AD%E8%AF%91%E7%89%88-%F0%9F%90%BC-lightyellow"> </a> <a href="" target='_blank'> <img src="https://visitor-badge.laobi.icu/badge?page_id=Daniel-xsy.RoboBEV&left_color=gray&right_color=red"> </a> </p>

About

RoboBEV is the first robustness evaluation benchmark tailored for camera-based bird's eye view (BEV) perception under natural data corruption and domain shift, which are cases that have a high likelihood to occur in real-world deployments.

[Common Corruption] - We investigate eight data corruption types that are likely to appear in driving scenarios, ranging from <sup>1</sup>sensor failure, <sup>2</sup>motion & data processing, <sup>3</sup>lighting conditions, and <sup>4</sup>weather conditions.

[Domain Shift] - We benchmark the adaptation performance of BEV models from three aspects, including <sup>1</sup>city-to-city, <sup>2</sup>day-to-night, and <sup>3</sup>dry-to-rain.

FRONT_LEFTFRONTFRONT_RIGHTFRONT_LEFTFRONTFRONT_RIGHT
<img src="docs/figs/front_left_snow.gif" width="120" height="67"><img src="docs/figs/front_snow.gif" width="120" height="67"><img src="docs/figs/front_right_snow.gif" width="120" height="67"><img src="docs/figs/front_left_dark.gif" width="120" height="67"><img src="docs/figs/front_dark.gif" width="120" height="67"><img src="docs/figs/front_right_dark.gif" width="120" height="67">
<img src="docs/figs/back_left_snow.gif" width="120" height="67"><img src="docs/figs/back_snow.gif" width="120" height="67"><img src="docs/figs/back_right_snow.gif" width="120" height="67"><img src="docs/figs/back_left_dark.gif" width="120" height="67"><img src="docs/figs/back_dark.gif" width="120" height="67"><img src="docs/figs/back_right_dark.gif" width="120" height="67">
BACK_LEFTBACKBACK_RIGHTBACK_LEFTBACKBACK_RIGHT

Visit our project page to explore more examples. :blue_car:

Updates

Outline

Installation

Kindly refer to INSTALL.md for the installation details.

Data Preparation

Our datasets are hosted by OpenDataLab.

<img src="https://raw.githubusercontent.com/opendatalab/dsdl-sdk/2ae5264a7ce1ae6116720478f8fa9e59556bed41/resources/opendatalab.svg" width="32%"/><br> OpenDataLab is a pioneering open data platform for the large AI model era, making datasets accessible. By using OpenDataLab, researchers can obtain free formatted datasets in various fields.

Kindly refer to DATA_PREPARE.md for the details to prepare the nuScenes and nuScenes-C datasets.

Getting Started

Kindly refer to GET_STARTED.md to learn more usage about this codebase.

Model Zoo

<details open> <summary>&nbsp<b>Camera-Only BEV Detection</b></summary>
</details> <details open> <summary>&nbsp<b>Camera-Only Monocular 3D Detection</b></summary>
</details> <details open> <summary>&nbsp<b>LiDAR-Camera Fusion BEV Detection</b></summary>
</details> <details open> <summary>&nbsp<b>Camera-Only BEV Map Segmentation</b></summary>
</details> <details open> <summary>&nbsp<b>Multi-Camera Depth Estimation</b></summary>
</details> <details open> <summary>&nbsp<b>Multi-Camera Semantic Occupancy Prediction</b></summary>
</details>

Robustness Benchmark

:triangular_ruler: Metrics: The nuScenes Detection Score (NDS) is consistently used as the main indicator for evaluating model performance in our benchmark. The following two metrics are adopted to compare between models' robustness:

:gear: Notation: Symbol <sup>:star:</sup> denotes the baseline model adopted in mCE calculation. For more detailed experimental results, please refer to RESULTS.md.

BEV Detection

ModelmCE (%) $\downarrow$mRR (%) $\uparrow$CleanCam CrashFrame LostColor QuantMotion BlurBrightLow LightFogSnow
DETR3D<sup>:star:</sup>100.0070.770.42240.28590.26040.31770.26610.40020.27860.39120.1913
DETR3D<sub>CBGS</sub>99.2170.020.43410.29910.26850.32350.25420.41540.27660.40200.1925
BEVFormer<sub>Small</sub>101.2359.070.47870.27710.24590.32750.25700.37410.24130.35830.1809
BEVFormer<sub>Base</sub>97.9760.400.51740.31540.30170.35090.26950.41840.25150.40690.1857
PETR<sub>R50-p4</sub>111.0161.260.36650.23200.21660.24720.22990.28410.15710.28760.1417
PETR<sub>VoV-p4</sub>100.6965.030.45500.29240.27920.29680.24900.38580.23050.37030.2632
ORA3D99.1768.630.44360.30550.27500.33600.26470.40750.26130.39590.1898
BEVDet<sub>R50</sub>115.1251.830.37700.24860.19240.24080.20610.25650.11020.24610.0625
BEVDet<sub>R101</sub>113.6853.120.38770.26220.20650.25460.22650.25540.11180.24950.0810
BEVDet<sub>R101-pt</sub>112.8056.350.37800.24420.19620.30410.25900.25990.13980.20730.0939
BEVDet<sub>SwinT</sub>116.4846.260.40370.26090.21150.22780.21280.21910.04900.24500.0680
BEVDepth<sub>R50</sub>110.0256.820.40580.26380.21410.27510.25130.28790.17570.29030.0863
BEVerse<sub>SwinT</sub>110.6748.600.46650.31810.30370.26000.26470.26560.05930.27810.0644
BEVerse<sub>SwinS</sub>117.8249.570.49510.33640.24850.28070.26320.33940.11180.28490.0985
PolarFormer<sub>R101</sub>96.0670.880.46020.31330.28080.35090.32210.43040.25540.42620.2304
PolarFormer<sub>VoV</sub>98.7567.510.45580.31350.28110.30760.23440.42800.24410.40610.2468
SRCN3D<sub>R101</sub>99.6770.230.42860.29470.26810.33180.26090.40740.25900.39400.1920
SRCN3D<sub>VoV</sub>102.0467.950.42050.28750.25790.28270.21430.38860.22740.37740.2499
Sparse4D<sub>R101</sub>100.0155.040.54380.28730.26110.33100.25140.39840.25100.38840.2259
SOLOFusion<sub>short</sub>108.6861.450.39070.25410.21950.28040.26030.29660.20330.29980.1066
SOLOFusion<sub>long</sub>97.9964.420.48500.31590.24900.35980.34600.40020.28140.39910.1480
SOLOFusion<sub>fusion</sub>92.8664.530.53810.38060.34640.40580.36420.43290.26260.44800.1376
FCOS3D<sub>finetune</sub>107.8262.090.39490.28490.24790.25740.25700.32180.14680.33210.1136
BEVFusion<sub>Cam</sub>109.0257.810.41210.27770.22550.27630.27880.29020.10760.30410.1461
BEVFusion<sub>LiDAR</sub>--0.6928--------
BEVFusion<sub>C+L</sub>43.8097.410.71380.69630.69310.70440.69770.70180.6787--
TransFusion--0.68870.68430.64470.68190.67490.68430.6663--
AutoAlignV2--0.61390.58490.58320.60060.59010.60760.5770--

Multi-Camera Depth Estimation

ModelMetricCleanCam CrashFrame LostColor QuantMotion BlurBrightLow LightFogSnow
SurroundDepthAbs Rel0.2800.4850.4970.3340.3380.3390.3540.3200.423

Multi-Camera Semantic Occupancy Prediction

ModelMetricCleanCam CrashFrame LostColor QuantMotion BlurBrightLow LightFogSnow
TPVFormermIoU vox52.0627.3922.8538.1638.6449.0037.3846.6919.39
SurroundOccSC mIoU20.3011.6010.0014.0312.4119.1812.1518.427.39
<p align="center"> <img src="docs/figs/stats.png"> </p>

BEV Model Calibration

ModelPretrainTemporalDepthCBGSBackboneEncoder<sub>BEV</sub>Input SizemCE (%)mRR (%)NDS
DETR3D✓✗✗✗ResNetAttention1600×900100.0070.770.4224
DETR3D<sub>CBGS</sub>✓✗✗✓ResNetAttention1600×90099.2170.020.4341
BEVFormer<sub>Small</sub>✓✓✗✗ResNetAttention1280×720101.2359.070.4787
BEVFormer<sub>Base</sub>✓✓✗✗ResNetAttention1600×90097.9760.400.5174
PETR<sub>R50-p4</sub>✗✗✗✗ResNetAttention1408×512111.0161.260.3665
PETR<sub>VoV-p4</sub>✓✗✗✗VoVNet<sub>V2</sub>Attention1600×900100.6965.030.4550
ORA3D✓✗✗✗ResNetAttention1600×90099.1768.630.4436
PolarFormer<sub>R101</sub>✓✗✗✗ResNetAttention1600×90096.0670.880.4602
PolarFormer<sub>VoV</sub>✓✗✗✗VoVNet<sub>V2</sub>Attention1600×90098.7567.510.4558
SRCN3D<sub>R101</sub>✓✗✗✗ResNetCNN+Attn.1600×90099.6770.230.4286
SRCN3D<sub>VoV</sub>✓✗✗✗VoVNet<sub>V2</sub>CNN+Attn.1600×900102.0467.950.4205
Sparse4D<sub>R101</sub>✓✓✗✗ResNetCNN+Attn.1600×900100.0155.040.5438
BEVDet<sub>R50</sub>✗✗✓✓ResNetCNN704×256115.1251.830.3770
BEVDet<sub>R101</sub>✗✗✓✓ResNetCNN704×256113.6853.120.3877
BEVDet<sub>R101-pt</sub>✓✗✓✓ResNetCNN704×256112.8056.350.3780
BEVDet<sub>SwinT</sub>✗✗✓✓SwinCNN704×256116.4846.260.4037
BEVDepth<sub>R50</sub>✗✗✓✓ResNetCNN704×256110.0256.820.4058
BEVerse<sub>SwinT</sub>✗✗✓✓SwinCNN704×256137.2528.240.1603
BEVerse<sub>SwinT</sub>✗✓✓✓SwinCNN704×256110.6748.600.4665
BEVerse<sub>SwinS</sub>✗✗✓✓SwinCNN1408×512132.1329.540.2682
BEVerse<sub>SwinS</sub>✗✓✓✓SwinCNN1408×512117.8249.570.4951
SOLOFusion<sub>short</sub>✗✓✓✗ResNetCNN704×256108.6861.450.3907
SOLOFusion<sub>long</sub>✗✓✓✗ResNetCNN704×25697.9964.420.4850
SOLOFusion<sub>fusion</sub>✗✓✓✓ResNetCNN704×25692.8664.530.5381

Note: Pretrain denotes models initialized from the FCOS3D checkpoint. Temporal indicates whether temporal information is used. Depth denotes models with an explicit depth estimation branch. CBGS highlight models use the class-balanced group-sampling strategy.

Create Corruption Set

You can manage to create your own "RoboBEV" corrpution sets! Follow the instructions listed in CREATE.md.

TODO List

Citation

If you find this work helpful, please kindly consider citing the following:

@article{xie2024benchmarking,
    title = {Benchmarking and Improving Bird's Eye View Perception Robustness in Autonomous Driving},
    author = {Xie, Shaoyuan and Kong, Lingdong and Zhang, Wenwei and Ren, Jiawei and Pan, Liang and Chen, Kai and Liu, Ziwei},
    journal = {arXiv preprint arXiv:2405.17426}, 
    year = {2024}
}
@article{xie2023robobev,
    title = {RoboBEV: Towards Robust Bird's Eye View Perception under Corruptions},
    author = {Xie, Shaoyuan and Kong, Lingdong and Zhang, Wenwei and Ren, Jiawei and Pan, Liang and Chen, Kai and Liu, Ziwei},
    journal = {arXiv preprint arXiv:2304.06719}, 
    year = {2023}
}
@misc{xie2023robobev_codebase,
    title = {The RoboBEV Benchmark for Robust Bird's Eye View Detection under Common Corruption and Domain Shift},
    author = {Xie, Shaoyuan and Kong, Lingdong and Zhang, Wenwei and Ren, Jiawei and Pan, Liang and Chen, Kai and Liu, Ziwei},
    howpublished = {\url{https://github.com/Daniel-xsy/RoboBEV}},
    year = {2023}
}

License

<a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-nc-sa/4.0/80x15.png" /></a> <br /> This work is under the <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License</a>, while some specific operations in this codebase might be with other licenses. Please refer to LICENSE.md for a more careful check, if you are using our code for commercial matters.

Acknowledgements

This work is developed based on the MMDetection3D codebase.

<img src="https://github.com/open-mmlab/mmdetection3d/blob/main/resources/mmdet3d-logo.png" width="30%"/><br> MMDetection3D is an open source object detection toolbox based on PyTorch, towards the next-generation platform for general 3D detection. It is a part of the OpenMMLab project developed by MMLab.

:heart: We thank Jiangmiao Pang and Tai Wang for their insightful discussions and feedback. We thank the OpenDataLab platform for hosting our datasets.