Home

Awesome

CVPR 2024 Papers Autonomous Driving

This repository is continuously updated. We prioritize including articles that have already been submitted to arXiv.

We kindly invite you to our platform, Auto Driving Heart, for paper interpretation and sharing. If you would like to promote your paper, please feel free to contact me.

1) End to End | 端到端自动驾驶

Is Ego Status All You Need for Open-Loop End-to-End Autonomous Driving?

Visual Point Cloud Forecasting enables Scalable Autonomous Driving

PlanKD: Compressing End-to-End Motion Planner for Autonomous Driving

VLP: Vision Language Planning for Autonomous Driving

2)LLM Agent | 大语言模型智能体

ChatSim: Editable Scene Simulation for Autonomous Driving via LLM-Agent Collaboration

LMDrive: Closed-Loop End-to-End Driving with Large Language Models

MAPLM: A Real-World Large-Scale Vision-Language Dataset for Map and Traffic Scene Understanding

One Prompt Word is Enough to Boost Adversarial Robustness for Pre-trained Vision-Language Models

PromptKD: Unsupervised Prompt Distillation for Vision-Language Models

RegionGPT: Towards Region Understanding Vision Language Model

Towards Learning a Generalist Model for Embodied Navigation

3)SSC: Semantic Scene Completion | 语义场景补全

Symphonize 3D Semantic Scene Completion with Contextual Instance Queries

PaSCo: Urban 3D Panoptic Scene Completion with Uncertainty Awareness

SemCity: Semantic Scene Generationwith Triplane Diffusion

4)OCC: Occupancy Prediction | 占用感知

SelfOcc: Self-Supervised Vision-Based 3D Occupancy Prediction

Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications

PanoOcc: Unified Occupancy Representation for Camera-based 3D Panoptic Segmentation

5) World Model | 世界模型

Driving into the Future: Multiview Visual Forecasting and Planning with World Model for Autonomous Driving

6)车道线检测

Lane2Seq: Towards Unified Lane Detection via Sequence Generation

7)Pre-training | 预训练

UniPAD: A Universal Pre-training Paradigm for Autonomous Driving

8)AIGC | 人工智能内容生成

Panacea: Panoramic and Controllable Video Generation for Autonomous Driving

SemCity: Semantic Scene Generation with Triplane Diffusion

BerfScene: Bev-conditioned Equivariant Radiance Fields for Infinite 3D Scene Generation

9)3D Object Detection | 三维目标检测

PTT: Point-Trajectory Transformer for Efficient Temporal 3D Object Detection

SeaBird: Segmentation in Bird’s View with Dice Loss Improves Monocular 3D Detection of Large Objects

VSRD: Instance-Aware Volumetric Silhouette Rendering for Weakly Supervised 3D Object Detection

CaKDP: Category-aware Knowledge Distillation and Pruning Framework for Lightweight 3D Object Detection

CN-RMA: Combined Network with Ray Marching Aggregation for 3D Indoors Object Detection from Multi-view Images

UniMODE: Unified Monocular 3D Object Detection

Enhancing 3D Object Detection with 2D Detection-Guided Query Anchors

SAFDNet: A Simple and Effective Network for Fully Sparse 3D Object Detection

RadarDistill: Boosting Radar-based Object Detection Performance via Knowledge Distillation from LiDAR Features

IS-Fusion: Instance-Scene Collaborative Fusion for Multimodal 3D Object Detection

RCBEVDet: Radar-camera Fusion in Bird’s Eye View for 3D Object Detection

MonoCD: Monocular 3D Object Detection with Complementary Depths

10)Stereo Matching | 双目立体匹配

MoCha-Stereo: Motif Channel Attention Network for Stereo Matching

Learning Intra-view and Cross-view Geometric Knowledge for Stereo Matching

Selective-Stereo: Adaptive Frequency Information Selection for Stereo Matching

Adaptive Multi-Modal Cross-Entropy Loss for Stereo Matching

Neural Markov Random Field for Stereo Matching

11)Cooperative Perception | 协同感知

RCooper: A Real-world Large-scale Dataset for Roadside Cooperative Perception

12)SLAM

SNI-SLAM: SemanticNeurallmplicit SLAM

CricaVPR: Cross-image Correlation-aware Representation Learning for Visual Place Recognition

Implicit Event-RGBD Neural SLAM

13)Scene Flow Estimation | 场景流估计

DifFlow3D: Toward Robust Uncertainty-Aware Scene Flow Estimation with Iterative Diffusion-Based Refinement

3DSFLabeling: Boosting 3D Scene Flow Estimation by Pseudo Auto Labeling

Regularizing Self-supervised 3D Scene Flows with Surface Awareness and Cyclic Consistency

14)Point Cloud | 点云

Point Transformer V3: Simpler, Faster, Stronger

Rethinking Few-shot 3D Point Cloud Semantic Segmentation

PDF: A Probability-Driven Framework for Open World 3D Point Cloud Semantic Segmentation

Weakly Supervised Point Cloud Semantic Segmentation via Artificial Oracle

GLiDR: Topologically Regularized Graph Generative Network for Sparse LiDAR Point Clouds

15) Efficient Network

Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications

RepViT: Revisiting Mobile CNN From ViT Perspective

16) Segmentation

OMG-Seg: Is One Model Good Enough For All Segmentation?

Stronger, Fewer, & Superior: Harnessing Vision Foundation Models for Domain Generalized Semantic Segmentation

SAM-6D: Segment Anything Model Meets Zero-Shot 6D Object Pose Estimation

SED: A Simple Encoder-Decoder for Open-Vocabulary Semantic Segmentation

Style Blind Domain Generalized Semantic Segmentation via Covariance Alignment and Semantic Consistence Contrastive Learning

17)Radar | 毫米波雷达

DART: Doppler-Aided Radar Tomography

RadSimReal: Bridging the Gap Between Synthetic and Real Data in Radar Object Detection With Simulation

18)Nerf与Gaussian Splatting

DrivingGaussian: Composite Gaussian Splatting for Surrounding Dynamic Autonomous Driving Scenes

Dynamic LiDAR Re-simulation using Compositional Neural Fields

NARUTO: Neural Active Reconstruction from Uncertain Target Observations

DNGaussian: Optimizing Sparse-View 3D Gaussian Radiance Fields with Global-Local Depth Normalization

19)MOT: Muti-object Tracking | 多物体跟踪

Delving into the Trajectory Long-tail Distribution for Muti-object Tracking

DeconfuseTrack:Dealing with Confusion for Multi-Object Tracking

20)Multi-label Atomic Activity Recognition

Action-slot: Visual Action-centric Representations for Multi-label Atomic Activity Recognition in Traffic Scenes

21) Motion Prediction | 运动预测

SmartRefine: An Scenario-Adaptive Refinement Framework for Efficient Motion Prediction

22) Trajectory Prediction | 轨迹预测

Test-Time Training of Trajectory Prediction via Masked Autoencoder and Actor-specific Token Memory

Producing and Leveraging Online Map Uncertainty in Trajectory Prediction

23) Depth Estimation | 深度估计

AFNet: Adaptive Fusion of Single-View and Multi-View Depth for Autonomous Driving

24) Event Camera | 事件相机

Seeing Motion at Nighttime with an Event Camera

自动驾驶学习社区

自动驾驶之心知识星球是过国内首个以自动驾驶技术栈为主线的交流学习社区(也是国内最大哦),这是一个前沿技术发布和学习的地方!我们汇总了自动驾驶感知(BEV、多模态感知、Occupancy、毫米波雷达视觉感知、车道线检测、3D感知、目标跟踪、多模态、多传感器融合、Transformer等)、自动驾驶定位建图(在线高精地图、高精地图、SLAM)、多传感器标定(Camera/Lidar/Radar/IMU等近20种方案)、Nerf、视觉语言模型、世界模型、规划控制、轨迹预测、领域技术方案、AI模型部署落地等几乎所有子方向的学习路线!

除此之外,还和数十家自动驾驶公司建立了内推渠道,简历直达!这里可以自由提问交流,许多算法工程师和硕博日常活跃,解决问题!初衷是希望能够汇集行业大佬的智慧,在学习和就业上帮到大家!星球的每周活跃度都在前50内,非常注重大家积极性的调度和讨论,欢迎加入一起成长!

加入链接:自动驾驶之心知识星球 | 国内首个自动驾驶全栈学习社区,近30+感知/融合/规划/标定/预测等学习路线

自动驾驶课程

1)感知算法

国内首个BEV感知全栈系列学习教程

多模态融合3D目标检测全栈教程

国内首个基于Transformer的分割检测+视觉大模型教程

Occupancy从入门到精通全栈教程

国内首个面向量产的车道线感知教程

点云3D目标检测理论与实战教程

单目3D与单目BEV全栈教程

国内首门毫米波&4D毫米波雷达理论实战教程

2)多传感器标定融合

多传感器融合与目标跟踪全栈教程

多传感器标定全栈系统学习教程(相机/Lidar/Radar/IMU近20+种在线/离线实战方案)

毫米波雷达和视觉融合感知全栈教程(深度学习+传统方式)

3)模型部署

基于TensroRT的CNN/Transformer/检测/BEV模型四大部署代码+CUDA加速全栈学习教程

4)规划控制与预测

规划控制理论&实战教程(从0到1彻底搞懂PNC算法)

轨迹预测理论与实战教程(国内首个轨迹预测系列)

轨迹预测论文带读教程(从论文角度分析轨迹预测领域)

5)Nerf与自动驾驶

国内首个Nerf与自动驾驶论文带读教程

6)大模型专场

国内首个大模型与自动驾驶应用论文带读教程

世界模型与自动驾驶论文带读课程

7)定位与建图

在线高精地图与自动驾驶论文带读教程

8)自动驾驶仿真

自动驾驶离不开的仿真!Carla-Autoware联合仿真实战

9)大专栏系列

多传感器融合感知标定全栈教程

多传感器标定/融合感知/模型部署全栈教程

感知算法与模型部署全栈教程

自动驾驶全栈算法工程师系列

多模态融合感知大专栏

自动驾驶全栈大专栏教程

规划控制&轨迹预测大专栏