Awesome
<div id="top" align="center"> <p align="center"> <img src="assets/images/repo/title_v2.jpg"> </p>DriveLM: Driving with Graph Visual Question Answering
<!-- Download dataset [**HERE**](docs/data_prep_nus.md) (serves as Official source for `Autonomous Driving Challenge 2024`) -->Autonomous Driving Challenge 2024
Driving-with-Language Leaderboard.
https://github.com/OpenDriveLab/DriveLM/assets/54334254/cddea8d6-9f6e-4e7e-b926-5afb59f8dce2
<!-- > above is new demo video. demo scene token: cc8c0bf57f984915a77078b10eb33198 -->Highlights <a name="highlight"></a>
π₯ We instantiate datasets (DriveLM-Data) built upon nuScenes and CARLA, and propose a VLM-based baseline approach (DriveLM-Agent) for jointly performing Graph VQA and end-to-end driving.
<!-- π₯ **The key insight** is that with our proposed suite, we obtain a suitable proxy task to mimic the human reasoning process during driving. -->π DriveLM serves as a main track in the CVPR 2024 Autonomous Driving Challenge
. Everything you need for the challenge is HERE, including baseline, test data and submission format and evaluation pipeline!
News <a name="news"></a>
[2024/07/16]
DriveLM official leaderboard reopen![2024/07/01]
DriveLM got accepted to ECCV 2024! Congrats to the team![2024/06/01]
Challenge ended up! See the final leaderboard.[2024/03/25]
Challenge test server is online and the test questions are released. Chekc it out![2024/02/29]
Challenge repo release. Baseline, data and submission format, evaluation pipeline. Have a look![2023/08/25]
DriveLM-nuScenes demo released.[2023/12/22]
DriveLM-nuScenes fullv1.0
and paper released.
Table of Contents
- Highlights
- Getting Started
- Current Endeavors and Future Horizons
- TODO List
- DriveLM-Data
- License and Citation
- Other Resources
Getting Started <a name="gettingstarted"></a>
To get started with DriveLM:
<p align="right">(<a href="#top">back to top</a>)</p>Current Endeavors and Future Directions <a name="timeline"></a>
<p align="center"> <img src="assets/images/repo/drivelm_timeline_v3.jpg"> </p>
- The advent of GPT-style multimodal models in real-world applications motivates the study of the role of language in driving.
- Date below reflects the arXiv submission date.
- If there is any missing work, please reach out to us!
DriveLM attempts to address some of the challenges faced by the community.
- Lack of data: DriveLM-Data serves as a comprehensive benchmark for driving with language.
- Embodiment: GVQA provides a potential direction for embodied applications of LLMs / VLMs.
- Closed-loop: DriveLM-CARLA attempts to explore closed-loop planning with language.
TODO List <a name="newsandtodolist"></a>
- DriveLM-Data
- DriveLM-nuScenes
- DriveLM-CARLA
- DriveLM-Metrics
- GPT-score
- DriveLM-Agent
- Inference code on DriveLM-nuScenes
- Inference code on DriveLM-CARLA
DriveLM-Data <a name="drivelmdata"></a>
We facilitate the Perception, Prediction, Planning, Behavior, Motion
tasks with human-written reasoning logic as a connection between them. We propose the task of GVQA on the DriveLM-Data.
π Comparison and Stats <a name="comparison"></a>
DriveLM-Data is the first language-driving dataset facilitating the full stack of driving tasks with graph-structured logical dependencies.
<!-- <center> | Language Dataset | Base Dataset | Language Form | Perspectives | Scale | Release?| |:---------:|:-------------:|:-------------:|:------:|:--------------------------------------------:|:----------:| | [BDD-X 2018](https://github.com/JinkyuKimUCB/explainable-deep-driving) | [BDD](https://bdd-data.berkeley.edu/) | Description | Perception & Reasoning | 8M frames, 20k text strings |**:heavy_check_mark:**| | [HAD 2019](https://usa.honda-ri.com/had) | [HDD](https://usa.honda-ri.com/hdd) | Advice | Goal-oriented & stimulus-driven advice | 5,675 video clips, 45k text strings |**:heavy_check_mark:**| | [DRAMA 2022](https://usa.honda-ri.com/drama) | - | Description | Perception & Planning results | 18k frames, 100k text strings | **:heavy_check_mark:**| | [Rank2Tell 2023](https://arxiv.org/abs/2309.06597) | - | Perception & Planning results | QA + Captions | 5k frames | :x: | | [nuScenes-QA 2023](https://arxiv.org/abs/2305.14836) | [nuScenes](https://www.nuscenes.org/) | QA | Perception Result | 30k frames, 460k generated QA pairs|**:heavy_check_mark:**| | [nuPrompt 2023](https://arxiv.org/abs/2309.04379) | [nuScenes](https://www.nuscenes.org/) | Object Description | Perception Result | 30k frames, 35k semi-generated QA pairs| :x:| | **DriveLM 2023** | [nuScenes](https://www.nuscenes.org/) | **:boom: QA + Scene Description** | **:boom:Perception, Prediction and Planning with Logic** | 30k frames, 360k annotated QA pairs |**:heavy_check_mark:** | </center> --> <p align="center"> <img src="assets/images/repo/paper_data_comp.png"> </p>Links to details about GVQA task, Dataset Features, and Annotation.
<!-- More details can be found [HERE](docs/data_details.md). --> <!-- ### What is included in the DriveLM-Data? DriveLM-Data comprises two distinct components: DriveLM-nuScenes and DriveLM-CARLA. In the case of DriveLM-nuScenes, we construct our dataset based on the prevailing nuScenes dataset. As for DriveLM-CARLA, we collect data from the CARLA simulator. The most central element of DriveLM is frame-based `multi-stage` `QA`. `Perception` questions require the model to recognize objects in the scene. `Prediction` questions ask the model to predict the future status of important objects in the scene. `Planning` questions prompt the model to give reasonable planning actions and avoid dangerous ones. We also include a `Behavior` question that provides behavior templates which aggregate the information from the other question types. ### How about the annotation process? <p align="center"> <img src="assets/images/repo/paper_data.jpg"> </p> **For DriveLM-nuScenes:** 1οΈβ£ Keyframe selection. Given all frames in one clip, the annotator selects the keyframes that need annotation. The criterion is that those frames should involve changes in ego-vehicle movement status (lane changes, sudden stops, start after a stop, etc.). 2οΈβ£ Key objects selection. Given keyframes, the annotator needs to pick up key objects in the six surrounding images. The criterion is that those objects should be able to affect the action of the ego vehicle (traffic signals, pedestrians crossing the road, other vehicles that move in the direction of the ego vehicle, etc.). 3οΈβ£ Question and answer annotation. Given those key objects, we automatically generate questions regarding single or multiple objects about perception, prediction, and planning. More details can be found in our data. **For DriveLM-CARLA:** We collect data using CARLA 0.9.14 in the Leaderboard 2.0 framework with a privileged rule-based expert. We set up a series of routes in urban, residential, and rural areas and execute the expert on these routes. During this process, we collect the necessary sensor data, generate relevant QAs based on privileged information about objects and the scene, and organize the logical relationships to connect this series of QAs into a graph. --> <p align="right">(<a href="#top">back to top</a>)</p>License and Citation <a name="licenseandcitation"></a>
All assets and code in this repository are under the Apache 2.0 license unless specified otherwise. The language data is under CC BY-NC-SA 4.0. Other datasets (including nuScenes) inherit their own distribution licenses. Please consider citing our paper and project if they help your research.
@article{sima2023drivelm,
title={DriveLM: Driving with Graph Visual Question Answering},
author={Sima, Chonghao and Renz, Katrin and Chitta, Kashyap and Chen, Li and Zhang, Hanxue and Xie, Chengen and Luo, Ping and Geiger, Andreas and Li, Hongyang},
journal={arXiv preprint arXiv:2312.14150},
year={2023}
}
@misc{contributors2023drivelmrepo,
title={DriveLM: Driving with Graph Visual Question Answering},
author={DriveLM contributors},
howpublished={\url{https://github.com/OpenDriveLab/DriveLM}},
year={2023}
}
<p align="right">(<a href="#top">back to top</a>)</p>
Other Resources <a name="otherresources"></a>
<a href="https://twitter.com/OpenDriveLab" target="_blank"> <img alt="Twitter Follow" src="https://img.shields.io/twitter/follow/OpenDriveLab?style=social&color=brightgreen&logo=twitter" /> </a> <!-- <a href="https://opendrivelab.com" target="_blank"> <img src="https://img.shields.io/badge/contact%40opendrivelab.com-white?style=social&logo=gmail"> </a> --> <!-- [![Page Views Count](https://badges.toozhao.com/badges/01H9CR01K73G1S0AKDMF1ABC73/blue.svg)](https://badges.toozhao.com/stats/01H9CR01K73G1S0AKDMF1ABC73 "Get your own page views count badge on badges.toozhao.com") -->OpenDriveLab
<a href="https://twitter.com/AutoVisionGroup" target="_blank"> <img alt="Twitter Follow" src="https://img.shields.io/twitter/follow/Awesome Vision Group?style=social&color=brightgreen&logo=twitter" /> </a>Autonomous Vision Group
- tuPlan garage | CARLA garage | Survey on E2EAD
- PlanT | KING | TransFuser | NEAT