Awesome
<p align="center"> <a href="https://arxiv.org/abs/2405.05173"> <img width="765" alt="image" src="assets/title.png"> </a> <p align="center"> <a href="https://scholar.google.com.hk/citations?user=kpMGaNIAAAAJ&hl=zh-CN"><strong>Huaiyuan Xu </strong></a> . <a href="https://scholar.google.com/citations?user=kqU2NJIAAAAJ&hl=zh-CN"><strong>Junliang Chen </strong></a> . <strong>Shiyu Meng</strong></a> . <a href="https://scholar.google.com/citations?user=MAG909MAAAAJ&hl=en"><strong>Yi Wang</strong></a> . <a href="https://scholar.google.com/citations?user=MYREIH0AAAAJ&hl=zh-CN"><strong>Lap-Pui Chau<sup>*</strong></a> </p> <p align="center"> <a href='https://arxiv.org/abs/2405.05173'> <img src='https://img.shields.io/badge/arXiv-PDF-green?style=flat&logo=arXiv&logoColor=green' alt='arXiv PDF'> </a>We research 3D Occupancy Perception for Autonomous Driving
This work focuses on 3D dense perception in autonomous driving, encompassing LiDAR-Centric Occupancy Perception, Vision-Centric Occupancy Perception, and Multi-Modal Occupancy Perception. Information fusion techniques for this field are discussed. We believe this will be the most comprehensive survey to date on 3D Occupancy Perception. Please stay tuned!πππ
This is an active repository, you can watch for following the latest advances. If you find it useful, please kindly star this repo.
β¨You are welcome to provide us your work with a topic related to 3D occupancy for autonomous driving (involving not only perception, but also applications)!
If you discover any missing work or have any suggestions, please feel free to submit a pull request or contact us. We will promptly add the missing papers to this repository.
β¨Highlight
[1] A systematically survey for the latest research on 3D occupancy perception in the field of autonomous driving.
[2] The survey provides the taxonomy of 3D occupancy perception, and elaborate on core methodological issues, including network pipelines, multi-source information fusion, and effective network training.
[3] The survey presents evaluations for 3D occupancy perception, and offers detailed performance comparisons. Furthermore, current limitations and future research directions are discussed.
π₯ News
- [2024-09-03] This survey got accepted by Information Fusion (Impact factor: 14.7).
- [2024-07-21] More representative works and benchmarking comparisons have been incorporated, bringing the total to 192 literature references.
- [2024-05-18] More figures have been added to the survey. We reorganize the occupancy-based applications.
- [2024-05-08] The first version of the survey is available on arXiv. We curate this repository.
Introduction
3D occupancy perception technology aims to observe and understand dense 3D environments for autonomous vehicles. Owing to its comprehensive perception capability, this technology is emerging as a trend in autonomous driving perception systems, and is attracting significant attention from both industry and academia. Similar to traditional bird's-eye view (BEV) perception, 3D occupancy perception has the nature of multi-source input and the necessity for information fusion. However, the difference is that it captures vertical structures that are ignored by 2D BEV. In this survey, we review the most recent works on 3D occupancy perception, and provide in-depth analyses of methodologies with various input modalities. Specifically, we summarize general network pipelines, highlight information fusion techniques, and discuss effective network training. We evaluate and analyze the occupancy perception performance of the state-of-the-art on the most popular datasets. Furthermore, challenges and future research directions are discussed. We hope this paper will inspire the community and encourage more research work on 3D occupancy perception.
</p> <p align='center'> <img src="assets/autonomous driving vehicle system.png" width="500px"> <p align='center'> <img src="assets/a brief history.png" width="1000px"> </p>Summary of Contents
- Introduction
- Summary of Contents
- Methods: A Survey
- 3D Occupancy Datasets
- Occupancy-based Applications
- Cite The Survey
- Contact
Methods: A Survey
LiDAR-Centric Occupancy Perception
Vision-Centric Occupancy Perception
Radar-Centric Occupancy Perception
Year | Venue | Paper Title | Link |
---|---|---|---|
2024 | NeurIPS | RadarOcc: Robust 3D Occupancy Prediction with 4D Imaging Radar | - |
Multi-Modal Occupancy Perception
3D Occupancy Datasets
Dataset | Year | Venue | Modality | # of Classes | Flow | Link |
---|---|---|---|---|---|---|
OpenScene | 2024 | CVPR 2024 Challenge | Camera | - | βοΈ | Intro. |
Cam4DOcc | 2024 | CVPR | Camera+LiDAR | 2 | βοΈ | Intro. |
Occ3D | 2024 | NeurIPS | Camera | 14 (Occ3D-Waymo), 16 (Occ3D-nuScenes) | β | Intro. |
OpenOcc | 2023 | ICCV | Camera | 16 | β | Intro. |
OpenOccupancy | 2023 | ICCV | Camera+LiDAR | 16 | β | Intro. |
SurroundOcc | 2023 | ICCV | Camera | 16 | β | Intro. |
OCFBench | 2023 | arXiv | LiDAR | -(OCFBench-Lyft), 17(OCFBench-Argoverse), 25(OCFBench-ApolloScape), 16(OCFBench-nuScenes) | β | Intro. |
SSCBench | 2023 | arXiv | Camera | 19(SSCBench-KITTI-360), 16(SSCBench-nuScenes), 14(SSCBench-Waymo) | β | Intro. |
SemanticKITT | 2019 | ICCV | Camera+LiDAR | 19(Semantic Scene Completion task) | β | Intro. |
Occupancy-based Applications
Segmentation
Specific Task | Year | Venue | Paper Title | Link |
---|---|---|---|---|
3D Panoptic Segmentation | 2024 | CVPR | PanoOcc: Unified Occupancy Representation for Camera-based 3D Panoptic Segmentation | Code |
BEV Segmentation | 2024 | CVPRW | OccFeat: Self-supervised Occupancy Feature Prediction for Pretraining BEV Segmentation Networks | Code |
Detection
Specific Task | Year | Venue | Paper Title | Link |
---|---|---|---|---|
3D Object Detection | 2024 | NeurIPS | Towards Flexible 3D Perception: Object-Centric Occupancy Completion Augments 3D Object Detection | Code |
3D Object Detection | 2024 | CVPR | Learning Occupancy for Monocular 3D Object Detection | Code |
3D Object Detection | 2024 | AAAI | SOGDet: Semantic-Occupancy Guided Multi-view 3D Object Detection | Code |
3D Object Detection | 2024 | arXiv | UltimateDO: An Efficient Framework to Marry Occupancy Prediction with 3D Object Detection via Channel2height | - |
Dynamic Perception
Specific Task | Year | Venue | Paper Title | Link |
---|---|---|---|---|
3D Flow Prediction | 2024 | CVPR | Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications | Code |
3D Flow Prediction | 2024 | arXiv | Let Occ Flow: Self-Supervised 3D Occupancy Flow Prediction | Project Page |
Generation
Specific Task | Year | Venue | Paper Title | Link |
---|---|---|---|---|
Scene Generation | 2024 | ECCV | Pyramid Diffusion for Fine 3D Large Scene Generation (Oral paper) | Code |
Scene Generation | 2024 | CVPR | SemCity: Semantic Scene Generation with Triplane Diffusion | Code |
Scene Generation | 2024 | arXiv | OccScene: Semantic Occupancy-based Cross-task Mutual Learning for 3D Scene Generation | - |
Scene Generation | 2024 | arXiv | UniScene: Unified Occupancy-centric Driving Scene Generation | Project Page |
Scene Generation | 2024 | arXiv | InfiniCube: Unbounded and Controllable Dynamic 3D Driving Scene Generation with World-Guided Video Models | Project Page |
Scene Generation | 2024 | arXiv | SyntheOcc: Synthesize Geometric-Controlled Street View Images through 3D Semantic MPIs | Project Page |
Navigation
Specific Task | Year | Venue | Paper Title | Link |
---|---|---|---|---|
Navigation for Air-Ground Robots | 2024 | RA-L | HE-Nav: A High-Performance and Efficient Navigation System for Aerial-Ground Robots in Cluttered Environments | Project Page |
Navigation for Air-Ground Robots | 2024 | ICRA | AGRNav: Efficient and Energy-Saving Autonomous Navigation for Air-Ground Robots in Occlusion-Prone Environments | Code |
Navigation for Air-Ground Robots | 2024 | arXiv | OMEGA: Efficient Occlusion-Aware Navigation for Air-Ground Robot in Dynamic Environments via State Space Model | Project Page |
World Models
Unified Autonomous Driving Algorithm Framework
Specific Tasks | Year | Venue | Paper Title | Link |
---|---|---|---|---|
Occupancy Prediction, 3D Object Detection, Online Mapping, Multi-object Tracking, Motion Prediction, Motion Planning | 2024 | CVPR | DriveWorld: 4D Pre-trained Scene Understanding via World Models for Autonomous Driving | - |
Occupancy Prediction, 3D Object Detection | 2024 | RA-L | UniScene: Multi-Camera Unified Pre-training via 3D Scene Reconstruction for Autonomous Driving | Code |
Occupancy Prediction, 3D Object Detection, HD map reconstruction | 2024 | arXiv | GaussianPretrain: A Simple Unified 3D Gaussian Representation for Visual Pre-training in Autonomous Driving | Code |
Occupancy Forecasting, Motion Planning | 2024 | arXiv | Driving in the Occupancy World: Vision-Centric 4D Occupancy Forecasting and Planning via World Models for Autonomous Driving | - |
Occupancy Prediction, 3D Object Detection, BEV segmentation, Motion Planning | 2023 | ICCV | Scene as Occupancy | Code |
Cite The Survey
If you find our survey and repository useful for your research project, please consider citing our paper:
@misc{xu2024survey,
title={A Survey on Occupancy Perception for Autonomous Driving: The Information Fusion Perspective},
author={Huaiyuan Xu and Junliang Chen and Shiyu Meng and Yi Wang and Lap-Pui Chau},
year={2024},
eprint={2405.05173},
archivePrefix={arXiv}
}
Contact
If you have any questions, please feel free to get in touch:
lap-pui.chau@polyu.edu.hk
huaiyuan.xu@polyu.edu.hk
If you are interested in joining us as a Ph.D. student to research computer vision, machine learning, please feel free to contact Professor Chau:
lap-pui.chau@polyu.edu.hk