Awesome
CVPR 2024 Papers Autonomous Driving
This repository is continuously updated. We prioritize including articles that have already been submitted to arXiv.
We kindly invite you to our platform, Auto Driving Heart, for paper interpretation and sharing. If you would like to promote your paper, please feel free to contact me.
1) End to End | 端到端自动驾驶
Is Ego Status All You Need for Open-Loop End-to-End Autonomous Driving?
Visual Point Cloud Forecasting enables Scalable Autonomous Driving
PlanKD: Compressing End-to-End Motion Planner for Autonomous Driving
VLP: Vision Language Planning for Autonomous Driving
2)LLM Agent | 大语言模型智能体
ChatSim: Editable Scene Simulation for Autonomous Driving via LLM-Agent Collaboration
LMDrive: Closed-Loop End-to-End Driving with Large Language Models
MAPLM: A Real-World Large-Scale Vision-Language Dataset for Map and Traffic Scene Understanding
One Prompt Word is Enough to Boost Adversarial Robustness for Pre-trained Vision-Language Models
PromptKD: Unsupervised Prompt Distillation for Vision-Language Models
RegionGPT: Towards Region Understanding Vision Language Model
Towards Learning a Generalist Model for Embodied Navigation
3)SSC: Semantic Scene Completion | 语义场景补全
Symphonize 3D Semantic Scene Completion with Contextual Instance Queries
PaSCo: Urban 3D Panoptic Scene Completion with Uncertainty Awareness
SemCity: Semantic Scene Generationwith Triplane Diffusion
4)OCC: Occupancy Prediction | 占用感知
SelfOcc: Self-Supervised Vision-Based 3D Occupancy Prediction
Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications
PanoOcc: Unified Occupancy Representation for Camera-based 3D Panoptic Segmentation
5) World Model | 世界模型
Driving into the Future: Multiview Visual Forecasting and Planning with World Model for Autonomous Driving
6)车道线检测
Lane2Seq: Towards Unified Lane Detection via Sequence Generation
7)Pre-training | 预训练
UniPAD: A Universal Pre-training Paradigm for Autonomous Driving
8)AIGC | 人工智能内容生成
Panacea: Panoramic and Controllable Video Generation for Autonomous Driving
SemCity: Semantic Scene Generation with Triplane Diffusion
- Paper:
- Code: https://github.com/zoomin-lee/SemCity
BerfScene: Bev-conditioned Equivariant Radiance Fields for Infinite 3D Scene Generation
9)3D Object Detection | 三维目标检测
PTT: Point-Trajectory Transformer for Efficient Temporal 3D Object Detection
SeaBird: Segmentation in Bird’s View with Dice Loss Improves Monocular 3D Detection of Large Objects
VSRD: Instance-Aware Volumetric Silhouette Rendering for Weakly Supervised 3D Object Detection
CaKDP: Category-aware Knowledge Distillation and Pruning Framework for Lightweight 3D Object Detection
CN-RMA: Combined Network with Ray Marching Aggregation for 3D Indoors Object Detection from Multi-view Images
UniMODE: Unified Monocular 3D Object Detection
Enhancing 3D Object Detection with 2D Detection-Guided Query Anchors
SAFDNet: A Simple and Effective Network for Fully Sparse 3D Object Detection
RadarDistill: Boosting Radar-based Object Detection Performance via Knowledge Distillation from LiDAR Features
IS-Fusion: Instance-Scene Collaborative Fusion for Multimodal 3D Object Detection
RCBEVDet: Radar-camera Fusion in Bird’s Eye View for 3D Object Detection
MonoCD: Monocular 3D Object Detection with Complementary Depths
- Paper:
- Code: https://github.com/dragonfly606/MonoCD
10)Stereo Matching | 双目立体匹配
MoCha-Stereo: Motif Channel Attention Network for Stereo Matching
Learning Intra-view and Cross-view Geometric Knowledge for Stereo Matching
Selective-Stereo: Adaptive Frequency Information Selection for Stereo Matching
Adaptive Multi-Modal Cross-Entropy Loss for Stereo Matching
Neural Markov Random Field for Stereo Matching
11)Cooperative Perception | 协同感知
RCooper: A Real-world Large-scale Dataset for Roadside Cooperative Perception
12)SLAM
SNI-SLAM: SemanticNeurallmplicit SLAM
CricaVPR: Cross-image Correlation-aware Representation Learning for Visual Place Recognition
Implicit Event-RGBD Neural SLAM
13)Scene Flow Estimation | 场景流估计
DifFlow3D: Toward Robust Uncertainty-Aware Scene Flow Estimation with Iterative Diffusion-Based Refinement
3DSFLabeling: Boosting 3D Scene Flow Estimation by Pseudo Auto Labeling
Regularizing Self-supervised 3D Scene Flows with Surface Awareness and Cyclic Consistency
14)Point Cloud | 点云
Point Transformer V3: Simpler, Faster, Stronger
Rethinking Few-shot 3D Point Cloud Semantic Segmentation
PDF: A Probability-Driven Framework for Open World 3D Point Cloud Semantic Segmentation
Weakly Supervised Point Cloud Semantic Segmentation via Artificial Oracle
- Paper:
- Code: https://github.com/jihun1998/AO
GLiDR: Topologically Regularized Graph Generative Network for Sparse LiDAR Point Clouds
- Paper:
- Code: https://github.com/GLiDR-CVPR2024/GLiDR
15) Efficient Network
Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications
RepViT: Revisiting Mobile CNN From ViT Perspective
16) Segmentation
OMG-Seg: Is One Model Good Enough For All Segmentation?
Stronger, Fewer, & Superior: Harnessing Vision Foundation Models for Domain Generalized Semantic Segmentation
SAM-6D: Segment Anything Model Meets Zero-Shot 6D Object Pose Estimation
SED: A Simple Encoder-Decoder for Open-Vocabulary Semantic Segmentation
Style Blind Domain Generalized Semantic Segmentation via Covariance Alignment and Semantic Consistence Contrastive Learning
17)Radar | 毫米波雷达
DART: Doppler-Aided Radar Tomography
RadSimReal: Bridging the Gap Between Synthetic and Real Data in Radar Object Detection With Simulation
18)Nerf与Gaussian Splatting
DrivingGaussian: Composite Gaussian Splatting for Surrounding Dynamic Autonomous Driving Scenes
Dynamic LiDAR Re-simulation using Compositional Neural Fields
- Paper: https://arxiv.org/pdf/2312.05247.pdf
- Code: https://github.com/prs-eth/Dynamic-LiDAR-Resimulation
NARUTO: Neural Active Reconstruction from Uncertain Target Observations
DNGaussian: Optimizing Sparse-View 3D Gaussian Radiance Fields with Global-Local Depth Normalization
19)MOT: Muti-object Tracking | 多物体跟踪
Delving into the Trajectory Long-tail Distribution for Muti-object Tracking
DeconfuseTrack:Dealing with Confusion for Multi-Object Tracking
20)Multi-label Atomic Activity Recognition
Action-slot: Visual Action-centric Representations for Multi-label Atomic Activity Recognition in Traffic Scenes
21) Motion Prediction | 运动预测
SmartRefine: An Scenario-Adaptive Refinement Framework for Efficient Motion Prediction
22) Trajectory Prediction | 轨迹预测
Test-Time Training of Trajectory Prediction via Masked Autoencoder and Actor-specific Token Memory
Producing and Leveraging Online Map Uncertainty in Trajectory Prediction
- Paper: https://arxiv.org/pdf/2403.16439.pdf
- Code: https://github.com/alfredgu001324/MapUncertaintyPrediction
23) Depth Estimation | 深度估计
AFNet: Adaptive Fusion of Single-View and Multi-View Depth for Autonomous Driving
24) Event Camera | 事件相机
Seeing Motion at Nighttime with an Event Camera
自动驾驶学习社区
自动驾驶之心知识星球是过国内首个以自动驾驶技术栈为主线的交流学习社区(也是国内最大哦),这是一个前沿技术发布和学习的地方!我们汇总了自动驾驶感知(BEV、多模态感知、Occupancy、毫米波雷达视觉感知、车道线检测、3D感知、目标跟踪、多模态、多传感器融合、Transformer等)、自动驾驶定位建图(在线高精地图、高精地图、SLAM)、多传感器标定(Camera/Lidar/Radar/IMU等近20种方案)、Nerf、视觉语言模型、世界模型、规划控制、轨迹预测、领域技术方案、AI模型部署落地等几乎所有子方向的学习路线!
除此之外,还和数十家自动驾驶公司建立了内推渠道,简历直达!这里可以自由提问交流,许多算法工程师和硕博日常活跃,解决问题!初衷是希望能够汇集行业大佬的智慧,在学习和就业上帮到大家!星球的每周活跃度都在前50内,非常注重大家积极性的调度和讨论,欢迎加入一起成长!
加入链接:自动驾驶之心知识星球 | 国内首个自动驾驶全栈学习社区,近30+感知/融合/规划/标定/预测等学习路线
自动驾驶课程
1)感知算法
国内首个基于Transformer的分割检测+视觉大模型教程
2)多传感器标定融合
多传感器标定全栈系统学习教程(相机/Lidar/Radar/IMU近20+种在线/离线实战方案)
3)模型部署
基于TensroRT的CNN/Transformer/检测/BEV模型四大部署代码+CUDA加速全栈学习教程
4)规划控制与预测
5)Nerf与自动驾驶
6)大模型专场
7)定位与建图
8)自动驾驶仿真
自动驾驶离不开的仿真!Carla-Autoware联合仿真实战