Home

Awesome

<div id="top" align="center">

ELM: Embodied Understanding of Driving Scenarios

Revive driving scene understanding by delving into the embodiment philosophy

<a href="https://arxiv.org/abs/2403.04593"><img src="https://img.shields.io/badge/arXiv-Paper-<color>"></a> <a href="https://opendrivelab.github.io/elm.github.io/"><img src="https://img.shields.io/badge/Project-Page-orange"></a> <a href="README.md"> <img alt="ELM: v1.0" src="https://img.shields.io/badge/ELM-v1.0-blueviolet"/> </a> <a href="#license-and-citation"> <img alt="License: Apache2.0" src="https://img.shields.io/badge/license-Apache%202.0-blue.svg"/> </a>

</div>

Yunsong Zhou, Linyan Huang, Qingwen Bu, Jia Zeng, Tianyu Li, Hang Qiu, Hongzi Zhu, Minyi Guo, Yu Qiao, and Hongyang Li

Highlights <a name="highlights"></a>

:fire: The first embodied language model for understanding the long-horizon driving scenarios in space and time.

:star2: ELM expands a wide spectrum of new tasks to fully leverage the capability of large language models in an embodiment setting and achieves significant improvements in various applications.

method

:trophy: Interpretable driving model, on the basis of language prompting, will be a main track in the CVPR 2024 Autonomous Driving Challenge. Please stay tuned for further details!

News <a name="news"></a>

Table of Contents

  1. Highlights
  2. News
  3. TODO List
  4. Installation
  5. Dataset
  6. Training and Inference
  7. License and Citation
  8. Related Resources

TODO List <a name="todo"></a>

Installation <a name="installation"></a>

  1. (Optional) Creating conda environment
conda create -n elm python=3.8
conda activate elm
  1. install from PyPI
pip install salesforce-lavis
  1. Or, for development, you may build from source
git clone https://github.com/OpenDriveLab/ELM.git
cd ELM
pip install -e .

Dataset <a name="dataset"></a>

Pre-training data. We collect driving videos from YouTube, nuScenes, Waymo, and Ego4D. Here we provide a sample of 🔗 YouTube video list we used. For privacy considerations, we are temporarily keeping the full-set data labels private. Part of pre-training data and reference checkpoints can be found in :floppy_disk: google drive.

Fine-tuning data. The full set of question and answer pairs for the benchmark can be obtained through this 🔗data link. You may need to download the corresponding image data from the official nuScenes and Ego4D channels. For a quick verification of the pipeline, we recommend downloading the subset dataset of DriveLM and organizing the data in line with the format.

Please make sure to soft link nuScenes and ego4d datasets under data/xx folder. You may need to run tools/video_clip_processor.py to pre-process data first. Besides, we provide some script used during auto-labeling, you may use these as a reference if you want to customize data.

Training <a name="training"></a>

# you can modify the lavis/projects/blip2/train/advqa_t5_elm.yaml
bash scripts/train.sh

Inference

Modify the advqa_t5_elm.yaml to enable the evaluate as True.

bash scripts/train.sh

For the evaluation of generated answers, please use the script in scripts/qa_eval.py.

python scripts/qa_eval.py <data_root> <log_name>

License and Citation

All assets and code in this repository are under the Apache 2.0 license unless specified otherwise. The language data is under CC BY-NC-SA 4.0. Other datasets (including nuScenes and Ego4D) inherit their own distribution licenses. Please consider citing our paper and project if they help your research.

@article{zhou2024embodied,
  title={Embodied Understanding of Driving Scenarios},
  author={Zhou, Yunsong and Huang, Linyan and Bu, Qingwen and Zeng, Jia and Li, Tianyu and Qiu, Hang and Zhu, Hongzi and Guo, Minyi and Qiao, Yu and Li, Hongyang},
  journal={arXiv preprint arXiv:2403.04593},
  year={2024}
}

Related Resources <a name="resources"></a>

We acknowledge all the open-source contributors for the following projects to make this work possible:

<a href="https://twitter.com/OpenDriveLab" target="_blank"> <img alt="Twitter Follow" src="https://img.shields.io/twitter/follow/OpenDriveLab?style=social&color=brightgreen&logo=twitter" /> </a>