Awesome

CVPR 2024 Papers Autonomous Driving

This repository is continuously updated. We prioritize including articles that have already been submitted to arXiv.

We kindly invite you to our platform, Auto Driving Heart, for paper interpretation and sharing. If you would like to promote your paper, please feel free to contact me.

1) End to End | 端到端自动驾驶

Is Ego Status All You Need for Open-Loop End-to-End Autonomous Driving?

Paper: https://arxiv.org/pdf/2312.03031.pdf
Code: https://github.com/NVlabs/BEV-Planner

Visual Point Cloud Forecasting enables Scalable Autonomous Driving

Paper: https://arxiv.org/pdf/2312.17655.pdf
Code: https://github.com/OpenDriveLab/ViDAR

PlanKD: Compressing End-to-End Motion Planner for Autonomous Driving

Paper: https://arxiv.org/pdf/2403.01238.pdf
Code: https://github.com/tulerfeng/PlanKD

VLP: Vision Language Planning for Autonomous Driving

Paper：https://arxiv.org/abs/2401.05577

2）LLM Agent | 大语言模型智能体

ChatSim: Editable Scene Simulation for Autonomous Driving via LLM-Agent Collaboration

Paper: https://arxiv.org/pdf/2402.05746.pdf
Code: https://github.com/yifanlu0227/ChatSim

LMDrive: Closed-Loop End-to-End Driving with Large Language Models

Paper: https://arxiv.org/pdf/2312.07488.pdf
Code: https://github.com/opendilab/LMDrive

MAPLM: A Real-World Large-Scale Vision-Language Dataset for Map and Traffic Scene Understanding

Code: https://github.com/LLVM-AD/MAPLM

One Prompt Word is Enough to Boost Adversarial Robustness for Pre-trained Vision-Language Models

Paper：https://arxiv.org/pdf/2403.01849.pdf
Code：https://github.com/TreeLLi/APT

PromptKD: Unsupervised Prompt Distillation for Vision-Language Models

Paper：https://arxiv.org/pdf/2403.02781

RegionGPT: Towards Region Understanding Vision Language Model

Paper：https://arxiv.org/pdf/2403.02330

Towards Learning a Generalist Model for Embodied Navigation

Paper: https://arxiv.org/pdf/2312.02010.pdf
Code: https://github.com/zd11024/NaviLLM

3）SSC: Semantic Scene Completion | 语义场景补全

Symphonize 3D Semantic Scene Completion with Contextual Instance Queries

Paper: https://arxiv.org/pdf/2306.15670.pdf
Code: https://github.com/hustvl/Symphonies

PaSCo: Urban 3D Panoptic Scene Completion with Uncertainty Awareness

Paper: https://arxiv.org/pdf/2312.02158.pdf
Code: https://github.com/astra-vision/PaSCo

SemCity: Semantic Scene Generationwith Triplane Diffusion

Paper: https://arxiv.org/pdf/2403.07773.pdf
Code: https://github.com/zoomin-lee/SemCity

4）OCC: Occupancy Prediction | 占用感知

SelfOcc: Self-Supervised Vision-Based 3D Occupancy Prediction

Paper: https://arxiv.org/pdf/2311.12754.pdf
Code: https://github.com/huang-yh/SelfOcc

Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications

Paper: https://arxiv.org/pdf/2311.17663.pdf
Code: https://github.com/haomo-ai/Cam4DOcc

PanoOcc: Unified Occupancy Representation for Camera-based 3D Panoptic Segmentation

Paper: https://arxiv.org/pdf/2306.10013.pdf
Code: https://github.com/Robertwyq/PanoOcc

5) World Model | 世界模型

Driving into the Future: Multiview Visual Forecasting and Planning with World Model for Autonomous Driving

Paper: https://arxiv.org/pdf/2311.17918.pdf
Code: https://github.com/BraveGroup/Drive-WM

6）车道线检测

Lane2Seq: Towards Unified Lane Detection via Sequence Generation

Paper：https://arxiv.org/abs/2402.17172

7）Pre-training | 预训练

UniPAD: A Universal Pre-training Paradigm for Autonomous Driving

Paper: https://arxiv.org/pdf/2310.08370.pdf
Code: https://github.com/Nightmare-n/UniPAD

8）AIGC | 人工智能内容生成

Panacea: Panoramic and Controllable Video Generation for Autonomous Driving

Paper: https://arxiv.org/pdf/2311.16813.pdf
Code: https://github.com/wenyuqing/panacea

SemCity: Semantic Scene Generation with Triplane Diffusion

Paper:
Code: https://github.com/zoomin-lee/SemCity

BerfScene: Bev-conditioned Equivariant Radiance Fields for Infinite 3D Scene Generation

Paper: https://arxiv.org/pdf/2312.02136.pdf
Code: https://github.com/zqh0253/BerfScene

9）3D Object Detection | 三维目标检测

PTT: Point-Trajectory Transformer for Efficient Temporal 3D Object Detection

Paper: https://arxiv.org/pdf/2312.08371.pdf
Code: https://github.com/KuanchihHuang/PTT

SeaBird: Segmentation in Bird’s View with Dice Loss Improves Monocular 3D Detection of Large Objects

Paper: https://arxiv.org/pdf/2403.20318
Code: https://github.com/abhi1kumar/SeaBird

VSRD: Instance-Aware Volumetric Silhouette Rendering for Weakly Supervised 3D Object Detection

Code: https://github.com/skmhrk1209/VSRD

CaKDP: Category-aware Knowledge Distillation and Pruning Framework for Lightweight 3D Object Detection

Code: https://github.com/zhnxjtu/CaKDP

CN-RMA: Combined Network with Ray Marching Aggregation for 3D Indoors Object Detection from Multi-view Images

Paper：https://arxiv.org/abs/2403.04198
Code：https://github.com/SerCharles/CN-RMA

UniMODE: Unified Monocular 3D Object Detection

Paper：https://arxiv.org/abs/2402.18573

Enhancing 3D Object Detection with 2D Detection-Guided Query Anchors

Paper：https://arxiv.org/abs/2403.06093
Code：https://github.com/nullmax-vision/QAF2D

SAFDNet: A Simple and Effective Network for Fully Sparse 3D Object Detection

Paper：https://arxiv.org/abs/2403.05817
Code：https://github.com/zhanggang001/HEDNet

RadarDistill: Boosting Radar-based Object Detection Performance via Knowledge Distillation from LiDAR Features

Paper：https://arxiv.org/pdf/2403.05061

IS-Fusion: Instance-Scene Collaborative Fusion for Multimodal 3D Object Detection

Paper: https://arxiv.org/pdf/2403.15241.pdf
Code: https://github.com/yinjunbo/IS-Fusion

RCBEVDet: Radar-camera Fusion in Bird’s Eye View for 3D Object Detection

Paper: https://arxiv.org/pdf/2403.16440.pdf
Code: https://github.com/VDIGPKU/RCBEVDet

MonoCD: Monocular 3D Object Detection with Complementary Depths

Paper:
Code: https://github.com/dragonfly606/MonoCD

10）Stereo Matching | 双目立体匹配

MoCha-Stereo: Motif Channel Attention Network for Stereo Matching

Code: https://github.com/ZYangChen/MoCha-Stereo

Learning Intra-view and Cross-view Geometric Knowledge for Stereo Matching

Paper：https://arxiv.org/abs/2402.19270
Code：https://github.com/DFSDDDDD1199/ICGNet

Selective-Stereo: Adaptive Frequency Information Selection for Stereo Matching

Paper：https://arxiv.org/abs/2403.00486
Code：https://github.com/Windsrain/Selective-Stereo

Adaptive Multi-Modal Cross-Entropy Loss for Stereo Matching

Paper: https://arxiv.org/pdf/2306.15612.pdf
Code: https://github.com/xxxupeng/ADL

Neural Markov Random Field for Stereo Matching

Paper: https://arxiv.org/pdf/2403.11193.pdf
Code: https://github.com/aeolusguan/NMRF

11）Cooperative Perception | 协同感知

RCooper: A Real-world Large-scale Dataset for Roadside Cooperative Perception

Code: https://github.com/ryhnhao/RCooper

12）SLAM

SNI-SLAM: SemanticNeurallmplicit SLAM

Paper: https://arxiv.org/pdf/2311.11016.pdf

CricaVPR: Cross-image Correlation-aware Representation Learning for Visual Place Recognition

Paper：https://arxiv.org/abs/2402.19231
Code：https://github.com/Lu-Feng/CricaVPR

Implicit Event-RGBD Neural SLAM

Paper: https://arxiv.org/pdf/2311.11013.pdf
Code: https://github.com/DelinQu/EN-SLAM

13）Scene Flow Estimation | 场景流估计

DifFlow3D: Toward Robust Uncertainty-Aware Scene Flow Estimation with Iterative Diffusion-Based Refinement

Paper: https://arxiv.org/pdf/2311.17456.pdf
Code: https://github.com/IRMVLab/DifFlow3D

3DSFLabeling: Boosting 3D Scene Flow Estimation by Pseudo Auto Labeling

Paper: https://arxiv.org/pdf/2402.18146.pdf
Code: https://github.com/jiangchaokang/3DSFLabelling

Regularizing Self-supervised 3D Scene Flows with Surface Awareness and Cyclic Consistency

Paper: https://arxiv.org/pdf/2312.08879.pdf
Code: https://github.com/vacany/sac-flow

14）Point Cloud | 点云

Point Transformer V3: Simpler, Faster, Stronger

Paper: https://arxiv.org/pdf/2312.10035.pdf
Code: https://github.com/Pointcept/PointTransformerV3

Rethinking Few-shot 3D Point Cloud Semantic Segmentation

Paper: https://arxiv.org/pdf/2403.00592.pdf
Code: https://github.com/ZhaochongAn/COSeg

PDF: A Probability-Driven Framework for Open World 3D Point Cloud Semantic Segmentation

Code: https://github.com/JinfengX/PointCloudPDF

Weakly Supervised Point Cloud Semantic Segmentation via Artificial Oracle

Paper:
Code: https://github.com/jihun1998/AO

GLiDR: Topologically Regularized Graph Generative Network for Sparse LiDAR Point Clouds

Paper:
Code: https://github.com/GLiDR-CVPR2024/GLiDR

15) Efficient Network

Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications

Paper: https://arxiv.org/pdf/2401.06197.pdf

RepViT: Revisiting Mobile CNN From ViT Perspective

Paper: https://arxiv.org/pdf/2307.09283.pdf
Code: https://github.com/THU-MIG/RepViT

16) Segmentation

OMG-Seg: Is One Model Good Enough For All Segmentation?

Paper: https://arxiv.org/pdf/2401.10229.pdf
Code: https://github.com/lxtGH/OMG-Seg

Stronger, Fewer, & Superior: Harnessing Vision Foundation Models for Domain Generalized Semantic Segmentation

Paper: https://arxiv.org/pdf/2312.04265.pdf
Code: https://github.com/w1oves/Rein

SAM-6D: Segment Anything Model Meets Zero-Shot 6D Object Pose Estimation

Paper：https://arxiv.org/abs/2311.15707

SED: A Simple Encoder-Decoder for Open-Vocabulary Semantic Segmentation

Paper：https://arxiv.org/abs/2311.15537

Style Blind Domain Generalized Semantic Segmentation via Covariance Alignment and Semantic Consistence Contrastive Learning

Paper：https://arxiv.org/abs/2403.06122

17）Radar | 毫米波雷达

DART: Doppler-Aided Radar Tomography

Code: https://github.com/thetianshuhuang/dart

RadSimReal: Bridging the Gap Between Synthetic and Real Data in Radar Object Detection With Simulation

Code: https://github.com/yuvalHG/RadSimReal

18）Nerf与Gaussian Splatting

DrivingGaussian: Composite Gaussian Splatting for Surrounding Dynamic Autonomous Driving Scenes

Paper: https://arxiv.org/pdf/2312.07920.pdf
Code: https://github.com/VDIGPKU/DrivingGaussian

Dynamic LiDAR Re-simulation using Compositional Neural Fields

NARUTO: Neural Active Reconstruction from Uncertain Target Observations

Paper：https://arxiv.org/abs/2402.18771

DNGaussian: Optimizing Sparse-View 3D Gaussian Radiance Fields with Global-Local Depth Normalization

Paper：https://arxiv.org/abs/2403.06912

19）MOT: Muti-object Tracking | 多物体跟踪

Delving into the Trajectory Long-tail Distribution for Muti-object Tracking

Code: https://github.com/chen-si-jia/Trajectory-Long-tail-Distribution-for-MOT

DeconfuseTrack:Dealing with Confusion for Multi-Object Tracking

Paper：https://arxiv.org/abs/2403.02767

20）Multi-label Atomic Activity Recognition

Action-slot: Visual Action-centric Representations for Multi-label Atomic Activity Recognition in Traffic Scenes

Paper: https://arxiv.org/pdf/2311.17948.pdf
Code: https://github.com/HCIS-Lab/Action-slot

21) Motion Prediction | 运动预测

SmartRefine: An Scenario-Adaptive Refinement Framework for Efficient Motion Prediction

Code: https://github.com/opendilab/SmartRefine

22) Trajectory Prediction | 轨迹预测

Test-Time Training of Trajectory Prediction via Masked Autoencoder and Actor-specific Token Memory

Paper: https://arxiv.org/pdf/2403.10052.pdf
Code: https://github.com/daeheepark/T4P

Producing and Leveraging Online Map Uncertainty in Trajectory Prediction

23) Depth Estimation | 深度估计

AFNet: Adaptive Fusion of Single-View and Multi-View Depth for Autonomous Driving

Paper: https://arxiv.org/pdf/2403.07535.pdf
Code: https://github.com/Junda24/AFNet

24) Event Camera | 事件相机

Seeing Motion at Nighttime with an Event Camera

Paper:
Code: https://github.com/Liu-haoyue/NER-Net?tab=readme-ov-file

自动驾驶学习社区

自动驾驶之心知识星球是过国内首个以自动驾驶技术栈为主线的交流学习社区（也是国内最大哦），这是一个前沿技术发布和学习的地方！我们汇总了自动驾驶感知（BEV、多模态感知、Occupancy、毫米波雷达视觉感知、车道线检测、3D感知、目标跟踪、多模态、多传感器融合、Transformer等）、自动驾驶定位建图（在线高精地图、高精地图、SLAM）、多传感器标定（Camera/Lidar/Radar/IMU等近20种方案）、Nerf、视觉语言模型、世界模型、规划控制、轨迹预测、领域技术方案、AI模型部署落地等几乎所有子方向的学习路线！

除此之外，还和数十家自动驾驶公司建立了内推渠道，简历直达！这里可以自由提问交流，许多算法工程师和硕博日常活跃，解决问题！初衷是希望能够汇集行业大佬的智慧，在学习和就业上帮到大家！星球的每周活跃度都在前50内，非常注重大家积极性的调度和讨论，欢迎加入一起成长！

加入链接：自动驾驶之心知识星球 | 国内首个自动驾驶全栈学习社区，近30+感知/融合/规划/标定/预测等学习路线

自动驾驶课程