Home

Awesome

<p align="center"> <a href="https://arxiv.org/abs/2405.05173"> <img width="765" alt="image" src="assets/title.png"> </a> <p align="center"> <a href="https://scholar.google.com.hk/citations?user=kpMGaNIAAAAJ&hl=zh-CN"><strong>Huaiyuan Xu </strong></a> . <a href="https://scholar.google.com/citations?user=kqU2NJIAAAAJ&hl=zh-CN"><strong>Junliang Chen </strong></a> . <strong>Shiyu Meng</strong></a> . <a href="https://scholar.google.com/citations?user=MAG909MAAAAJ&hl=en"><strong>Yi Wang</strong></a> . <a href="https://scholar.google.com/citations?user=MYREIH0AAAAJ&hl=zh-CN"><strong>Lap-Pui Chau<sup>*</strong></a> </p> <p align="center"> <a href='https://arxiv.org/abs/2405.05173'> <img src='https://img.shields.io/badge/arXiv-PDF-green?style=flat&logo=arXiv&logoColor=green' alt='arXiv PDF'> </a>

We research 3D Occupancy Perception for Autonomous Driving

This work focuses on 3D dense perception in autonomous driving, encompassing LiDAR-Centric Occupancy Perception, Vision-Centric Occupancy Perception, and Multi-Modal Occupancy Perception. Information fusion techniques for this field are discussed. We believe this will be the most comprehensive survey to date on 3D Occupancy Perception. Please stay tuned!😉😉😉

This is an active repository, you can watch for following the latest advances. If you find it useful, please kindly star this repo.

✨You are welcome to provide us your work with a topic related to 3D occupancy for autonomous driving (involving not only perception, but also applications)!

If you discover any missing work or have any suggestions, please feel free to submit a pull request or contact us. We will promptly add the missing papers to this repository.

✨Highlight

[1] A systematically survey for the latest research on 3D occupancy perception in the field of autonomous driving.

[2] The survey provides the taxonomy of 3D occupancy perception, and elaborate on core methodological issues, including network pipelines, multi-source information fusion, and effective network training.

[3] The survey presents evaluations for 3D occupancy perception, and offers detailed performance comparisons. Furthermore, current limitations and future research directions are discussed.

🔥 News

Introduction

3D occupancy perception technology aims to observe and understand dense 3D environments for autonomous vehicles. Owing to its comprehensive perception capability, this technology is emerging as a trend in autonomous driving perception systems, and is attracting significant attention from both industry and academia. Similar to traditional bird's-eye view (BEV) perception, 3D occupancy perception has the nature of multi-source input and the necessity for information fusion. However, the difference is that it captures vertical structures that are ignored by 2D BEV. In this survey, we review the most recent works on 3D occupancy perception, and provide in-depth analyses of methodologies with various input modalities. Specifically, we summarize general network pipelines, highlight information fusion techniques, and discuss effective network training. We evaluate and analyze the occupancy perception performance of the state-of-the-art on the most popular datasets. Furthermore, challenges and future research directions are discussed. We hope this paper will inspire the community and encourage more research work on 3D occupancy perception.

</p> <p align='center'> <img src="assets/autonomous driving vehicle system.png" width="500px"> <p align='center'> <img src="assets/a brief history.png" width="1000px"> </p>

Summary of Contents

Methods: A Survey

LiDAR-Centric Occupancy Perception

YearVenuePaper TitleLink
2024NeurIPSTALoS: Enhancing Semantic Scene Completion via Test-time Adaptation on the Line of SightCode
2024CVPRPaSCo: Urban 3D Panoptic Scene Completion with Uncertainty Awareness (Best paper award candidate)Project Page
2024IROSLiDAR-based 4D Occupancy Completion and ForecastingProject Page
2024arXivOccRWKV: Rethinking Efficient 3D Semantic Occupancy Prediction with Linear ComplexityProject Page
2024arXivDiffSSC: Semantic LiDAR Scan Completion using Denoising Diffusion Probabilistic Models-
2024arXivMergeOcc: Bridge the Domain Gap between Different LiDARs for Robust Occupancy Prediction-
2023T-IVOccupancy-MAE: Self-supervised Pre-training Large-scale LiDAR Point Clouds with Masked Occupancy AutoencodersCode
2023arXivPointOcc: Cylindrical Tri-Perspective View for Point-based 3D Semantic Occupancy PredictionCode
2021T-PAMISemantic Scene Completion using Local Deep Implicit Functions on LiDAR Data-
2021AAAISparse Single Sweep LiDAR Point Cloud Segmentation via Learning Contextual Shape Priors from Scene CompletionCode
2020CoRLS3CNet: A Sparse Semantic Scene Completion Network for LiDAR Point Clouds-
20203DVLMSCNet: Lightweight Multiscale 3D Semantic CompletionCode

Vision-Centric Occupancy Perception

YearVenuePaper TitleLink
2024NeurIPSContext and Geometry Aware Voxel Transformer for Semantic Scene Completion (Spotlight paper)Code
2024NeurIPSOPUS: Occupancy Prediction Using a Sparse SetCode
2024ECCVViewFormer: Exploring Spatiotemporal Modeling for Multi-View 3D Occupancy Perception via View-Guided TransformersCode
2024ECCVCVT-Occ: Cost Volume Temporal Fusion for 3D Occupancy PredictionCode
2024ECCVVEON: Vocabulary-Enhanced Occupancy PredictionCode
2024ECCVFully Sparse 3D Occupancy PredictionCode
2024ECCVGaussianFormer: Scene as Gaussians for Vision-Based 3D Semantic Occupancy PredictionProject Page
2024ECCVOccupancy as Set of PointsCode
2024ECCVHierarchical Temporal Context Learning for Camera-based Semantic Scene CompletionCode
2024CVPRLowRankOcc: Tensor Decomposition and Low-Rank Recovery for Vision-based 3D Semantic Occupancy Prediction-
2024CVPRBi-SSC: Geometric-Semantic Bidirectional Fusion for Camera-based 3D Semantic Scene Completion-
2024CVPRSymphonize 3D Semantic Scene Completion with Contextual Instance QueriesCode
2024CVPRSparseOcc: Rethinking Sparse Latent Representation for Vision-Based Semantic Occupancy PredictionProject Page
2024CVPRSelfOcc: Self-Supervised Vision-Based 3D Occupancy PredictionProject Page
2024CVPRPanoOcc: Unified Occupancy Representation for Camera-based 3D Panoptic SegmentationCode
2024CVPRNot All Voxels Are Equal: Hardness-Aware Semantic Scene Completion with Self-DistillationCode
2024CVPRCOTR: Compact Occupancy TRansformer for Vision-based 3D Occupancy PredictionCode
2024CVPRCollaborative Semantic Occupancy Prediction with Hybrid Feature Fusion in Connected Automated VehiclesProject Page
2024CVPRCam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving ApplicationsCode
2024CVPRBoosting Self-Supervision for Single-View Scene Completion via Knowledge DistillationProject Page
2024CVPRDriveWorld: 4D Pre-trained Scene Understanding viaWorld Models for Autonomous Driving-
2024T-IPCamera-based 3D Semantic Scene Completion with Sparse Guidance NetworkCode
2024CoRLLet Occ Flow: Self-Supervised 3D Occupancy Flow PredictionProject Page
2024IJCAILabel-efficient Semantic Scene Completion with Scribble AnnotationsCode
2024IJCAIBridging Stereo Geometry and BEV Representation with Reliable Mutual Interaction for Semantic Scene CompletionCode
2024ICRAThe RoboDrive Challenge: Drive Anytime Anywhere in Any ConditionProject Page
2024ICRARenderOcc: Vision-Centric 3D Occupancy Prediction with 2D Rendering SupervisionCode
2024ICRAMonoOcc: Digging into Monocular Semantic Occupancy PredictionCode
2024ICRAFastOcc: Accelerating 3D Occupancy Prediction by Fusing the 2D Bird’s-Eye View and Perspective View-
2024AAAIRegulating Intermediate 3D Features for Vision-Centric Autonomous DrivingCode
2024AAAIOne at a Time: Progressive Multi-step Volumetric Probability Learning for Reliable 3D Scene Perception-
2024RA-LHybridOcc: NeRF Enhanced Transformer-based Multi-Camera 3D Occupancy Prediction-
2024RA-LUniScene: Multi-Camera Unified Pre-Training via 3D Scene ReconstructionCode
2024AAIMLSOccDPT: Semi-Supervised 3D Semantic Occupancy from Dense Prediction Transformers trained under memory constraintsProject Page
20243DVPanoSSC: Exploring Monocular Panoptic 3D Scene Reconstruction for Autonomous Driving-
2024IROSSSCBench: Monocular 3D Semantic Scene Completion Benchmark in Street ViewsCode
2024arXivET-Former: Efficient Triplane Deformable Attention for 3D Semantic Scene Completion From Monocular Camera-
2024arXivReliOcc: Towards Reliable Semantic Occupancy Prediction via Uncertainty Learning-
2024arXivSemi-supervised 3D Semantic Scene Completion with 2D Vision Foundation Model Guidance-
2024arXivDeep Height Decoupling for Precise Vision-based 3D Occupancy PredictionCode
2024arXivAdaOcc: Adaptive-Resolution Occupancy Prediction-
2024arXivGaussianOcc: Fully Self-supervised and Efficient 3D Occupancy Estimation with Gaussian SplattingProject Page
2024arXivMambaOcc: Visual State Space Model for BEV-based Occupancy Prediction with Local Adaptive ReorderingCode
2024arXivVPOcc: Exploiting Vanishing Point for Monocular 3D Semantic Occupancy Prediction-
2024arXivUniVision: A Unified Framework for Vision-Centric 3D PerceptionCode
2024arXivLangOcc: Self-Supervised Open Vocabulary Occupancy Estimation via Volume Rendering-
2024arXivReal-Time 3D Occupancy Prediction via Geometric-Semantic Disentanglement-
2024arXivα-SSC: Uncertainty-Aware Camera-based 3D Semantic Scene Completion-
2024arXivPanoptic-FlashOcc: An Efficient Baseline to Marry Semantic Occupancy with Panoptic via Instance CenterCode
2024arXivBDC-Occ: Binarized Deep Convolution Unit For Binarized Occupancy NetworkCode
2024arXivGEOcc: Geometrically Enhanced 3D Occupancy Network with Implicit-Explicit Depth Fusion and Contextual Self-Supervision-
2024arXivOccFlowNet: Towards Self-supervised Occupancy Estimation via Differentiable Rendering and Occupancy Flow-
2024arXivOccFiner: Offboard Occupancy Refinement with Hybrid Propagation-
2024arXivInverseMatrixVT3D: An Efficient Projection Matrix-Based Approach for 3D Occupancy PredictionCode
2024arXivUnified Spatio-Temporal Tri-Perspective View Representation for 3D Semantic Occupancy PredictionProject Page
2023CVPRVoxFormer: Sparse Voxel Transformer for Camera-based 3D Semantic Scene CompletionCode
2023CVPRTri-Perspective View for Vision-Based 3D Semantic Occupancy PredictionProject Page
2023NeurIPSPOP-3D: Open-Vocabulary 3D Occupancy Prediction from ImagesProject Page
2023NeurIPSOcc3D: A Large-Scale 3D Occupancy Prediction Benchmark for Autonomous DrivingProject Page
2023ICCVSurroundOcc: Multi-Camera 3D Occupancy Prediction for Autonomous DrivingProject Page
2023ICCVScene as OccupancyCode
2023ICCVOccFormer: Dual-path Transformer for Vision-based 3D Semantic Occupancy PredictionCode
2023ICCVNDC-Scene: Boost Monocular 3D Semantic Scene Completion in Normalized Device Coordinates SpaceCode
2023T-IV3DOPFormer: 3D Occupancy Perception from Multi-Camera Images with Directional and Distance EnhancementCode
2023arXivOccupancyDETR: Using DETR for Mixed Dense-sparse 3D Occupancy Prediction-
2023arXivOVO: Open-Vocabulary OccupancyCode
2023arXivOctreeOcc: Efficient and Multi-Granularity Occupancy Prediction Using Octree QueriesCode
2023arXivOccNeRF: Advancing 3D Occupancy Prediction in LiDAR-Free EnvironmentsProject Page
2023arXivOccDepth: A Depth-Aware Method for 3D Semantic Scene CompletionCode
2023arXivFlashOcc: Fast and Memory-Efficient Occupancy Prediction via Channel-to-Height PluginCode
2023arXivFB-OCC: 3D Occupancy Prediction based on Forward-Backward View TransformationCode
2023arXivDepthSSC: Depth-Spatial Alignment and Dynamic Voxel Resolution for Monocular 3D Semantic Scene Completion-
2023arXivA Simple Framework for 3D Occupancy Estimation in Autonomous DrivingCode
2023arXivUniWorld: Autonomous Driving Pre-training via World ModelsCode
2022CVPRMonoScene: Monocular 3D Semantic Scene CompletionProject Page

Radar-Centric Occupancy Perception

YearVenuePaper TitleLink
2024NeurIPSRadarOcc: Robust 3D Occupancy Prediction with 4D Imaging Radar-

Multi-Modal Occupancy Perception

YearVenuePaper TitleCode
2024ECCVOccGen: Generative Multi-modal 3D Occupancy Prediction for Autonomous DrivingProject Page
2024RA-LCo-Occ: Coupling Explicit Feature Fusion with Volume Rendering Regularization for Multi-Modal 3D Semantic Occupancy PredictionProject Page
2024arXivDAOcc: 3D Object Detection Assisted Multi-Sensor Fusion for 3D Occupancy PredictionCode
2024arXivOccMamba: Semantic Occupancy Prediction with State Space Models-
2024arXivLiCROcc: Teach Radar for Accurate Semantic Occupancy Prediction using LiDAR and CameraProject Page
2024arXivOccFusion: Depth Estimation Free Multi-sensor Fusion for 3D Occupancy Prediction-
2024arXivEFFOcc: A Minimal Baseline for EFficient Fusion-based 3D Occupancy NetworkCode
2024arXivReal-time 3D semantic occupancy prediction for autonomous vehicles using memory-efficient sparse convolution-
2024arXivOccFusion: A Straightforward and Effective Multi-Sensor Fusion Framework for 3D Occupancy Prediction-
2024arXivUnleashing HyDRa: Hybrid Fusion, Depth Consistency and Radar for Unified 3D Perception-
2023ICCVOpenOccupancy: A Large Scale Benchmark for Surrounding Semantic Occupancy PerceptionCode

3D Occupancy Datasets

DatasetYearVenueModality# of ClassesFlowLink
OpenScene2024CVPR 2024 ChallengeCamera-✔️Intro.
Cam4DOcc2024CVPRCamera+LiDAR2✔️Intro.
Occ3D2024NeurIPSCamera14 (Occ3D-Waymo), 16 (Occ3D-nuScenes)Intro.
OpenOcc2023ICCVCamera16Intro.
OpenOccupancy2023ICCVCamera+LiDAR16Intro.
SurroundOcc2023ICCVCamera16Intro.
OCFBench2023arXivLiDAR-(OCFBench-Lyft), 17(OCFBench-Argoverse), 25(OCFBench-ApolloScape), 16(OCFBench-nuScenes)Intro.
SSCBench2023arXivCamera19(SSCBench-KITTI-360), 16(SSCBench-nuScenes), 14(SSCBench-Waymo)Intro.
SemanticKITT2019ICCVCamera+LiDAR19(Semantic Scene Completion task)Intro.

Occupancy-based Applications

Segmentation

Specific TaskYearVenuePaper TitleLink
3D Panoptic Segmentation2024CVPRPanoOcc: Unified Occupancy Representation for Camera-based 3D Panoptic SegmentationCode
BEV Segmentation2024CVPRWOccFeat: Self-supervised Occupancy Feature Prediction for Pretraining BEV Segmentation NetworksCode

Detection

Specific TaskYearVenuePaper TitleLink
3D Object Detection2024CVPRLearning Occupancy for Monocular 3D Object DetectionCode
3D Object Detection2024AAAISOGDet: Semantic-Occupancy Guided Multi-view 3D Object DetectionCode
3D Object Detection2024arXivUltimateDO: An Efficient Framework to Marry Occupancy Prediction with 3D Object Detection via Channel2height-

Dynamic Perception

Specific TaskYearVenuePaper TitleLink
3D Flow Prediction2024CVPRCam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving ApplicationsCode
3D Flow Prediction2024arXivLet Occ Flow: Self-Supervised 3D Occupancy Flow PredictionProject Page

Generation

Specific TaskYearVenuePaper TitleLink
Scene Generation2024ECCVPyramid Diffusion for Fine 3D Large Scene Generation (Oral paper)Code
Scene Generation2024CVPRSemCity: Semantic Scene Generation with Triplane DiffusionCode
Scene Generation2024arXivSyntheOcc: Synthesize Geometric-Controlled Street View Images through 3D Semantic MPIsProject Page

Navigation

Specific TaskYearVenuePaper TitleLink
Navigation for Air-Ground Robots2024RA-LHE-Nav: A High-Performance and Efficient Navigation System for Aerial-Ground Robots in Cluttered EnvironmentsProject Page
Navigation for Air-Ground Robots2024ICRAAGRNav: Efficient and Energy-Saving Autonomous Navigation for Air-Ground Robots in Occlusion-Prone EnvironmentsCode
Navigation for Air-Ground Robots2024arXivOMEGA: Efficient Occlusion-Aware Navigation for Air-Ground Robot in Dynamic Environments via State Space ModelProject Page

World Models

Specific TaskYearVenuePaper TitleLink
4D Occupancy Forecasting and Motion Planing2024ECCVOccWorld: Learning a 3D Occupancy World Model for Autonomous DrivingProject Page
4D Occupancy Forecasting2024CVPRUnO: Unsupervised Occupancy Fields for Perception and Forecasting (Oral paper)Project Page
4D Representation Learning Framework2024CVPRDriveWorld: 4D Pre-trained Scene Understanding via World Models for Autonomous Driving-
4D Occupancy Forecasting2024CVPRCam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving ApplicationsCode
4D Occupancy Forecasting2024AAAISemantic Complete Scene Forecasting from a 4D Dynamic Point Cloud SequenceProject Page
4D Occupancy Forecasting and Generation2024arXivDOME: Taming Diffusion Model into High-Fidelity Controllable Occupancy World ModelProject Page
4D Occupancy Forecasting2024arXivFSF-Net: Enhance 4D Occupancy Forecasting with Coarse BEV Scene Flow for Autonomous Driving-
4D Occupancy Forecasting and Motion Planing2024arXivRenderWorld: World Model with Self-Supervised 3D Label-
4D Occupancy Forecasting, Motion Planing, and Reasoning2024arXivOccLLaMA: An Occupancy-Language-Action Generative World Model for Autonomous Driving-
4D Occupancy Forecasting and Generation2024arXivDriving in the Occupancy World: Vision-Centric 4D Occupancy Forecasting and Planning via World Models for Autonomous Driving-
4D Occupancy Generation2024arXivOccSora: 4D Occupancy Generation Models as World Simulators for Autonomous DrivingProject Page
4D Occupancy Forecasting2023CVPRPoint Cloud Forecasting as a Proxy for 4D Occupancy ForecastingProject Page

Unified Autonomous Driving Algorithm Framework

Specific TasksYearVenuePaper TitleLink
Occupancy Prediction, 3D Object Detection, Online Mapping, Multi-object Tracking, Motion Prediction, Motion Planning2024CVPRDriveWorld: 4D Pre-trained Scene Understanding via World Models for Autonomous Driving-
Occupancy Prediction, 3D Object Detection2024RA-LUniScene: Multi-Camera Unified Pre-training via 3D Scene Reconstruction for Autonomous DrivingCode
Occupancy Forecasting, Motion Planning2024arXivDriving in the Occupancy World: Vision-Centric 4D Occupancy Forecasting and Planning via World Models for Autonomous Driving-
Occupancy Prediction, 3D Object Detection, BEV segmentation, Motion Planning2023ICCVScene as OccupancyCode

Cite The Survey

If you find our survey and repository useful for your research project, please consider citing our paper:

@misc{xu2024survey,
      title={A Survey on Occupancy Perception for Autonomous Driving: The Information Fusion Perspective}, 
      author={Huaiyuan Xu and Junliang Chen and Shiyu Meng and Yi Wang and Lap-Pui Chau},
      year={2024},
      eprint={2405.05173},
      archivePrefix={arXiv}
}

Contact

If you have any questions, please feel free to get in touch:

lap-pui.chau@polyu.edu.hk
huaiyuan.xu@polyu.edu.hk

If you are interested in joining us as a Ph.D. student to research computer vision, machine learning, please feel free to contact Professor Chau:

lap-pui.chau@polyu.edu.hk