Awesome
<div align ="center"> <img src="./assets/logo.png" width="20%"> <h3> Senna: Bridging Large Vision-Language Models and End-to-End Autonomous Driving </h3>Bo Jiang<sup>1</sup>, Shaoyu Chen<sup>1</sup>, Bencheng Liao<sup>1</sup>, Xingyu Zhang<sup>2</sup>, Wei Yin<sup>2</sup>, Qian Zhang<sup>2</sup>, Chang Huang<sup>2</sup>, Wenyu Liu<sup>1</sup>, Xinggang Wang<sup>1,📧</sup>
<sup>1</sup> Huazhong University of Science and Technology, <sup>2</sup> Horizon Robotics, <sup>📧</sup> corresponding author
</div>https://github.com/user-attachments/assets/3fd172be-d78d-47ae-867d-a473a1e6ddd6
News
[2024-12-08]:
We have released the code and weight of Senna-VLM, along with the training and evaluation scripts.
[2024-10-29]:
Senna arXiv paper released. Code/Models are coming soon. Please stay tuned! ☕️
Highlights
-
Senna is an autonomous driving system that integrates a Large Vision-Language Model with an end-to-end model to improve planning safety, robustness and generalization.
-
Senna achieves SOTA planning performance and demonstrates strong cross-scenario generalization and transferability.
Getting Started
Installtion
git clone git@github.com:hustvl/Senna.git
conda create -n senna python=3.10 -y
conda activate senna
pip install -r requirements.txt
Data Preparation
We provide a script for generating QA data required for Senna training. The script uses LLaVA-v1.6-34b as the model for generating scene descriptions and planning explanations. You can run the script as follows:
sh data_tools/senna_nusc_converter.sh
Weights
Method | Model Size | Base LLM | Input View | Token per Image | Download |
---|---|---|---|---|---|
Senna | 7B | vicuna-7b-v1.5 | 6 View | 128 | Hugging Face |
Training
For Stage-1 Mix Pre-training:
sh train_tools/pretrain_senna_llava.sh
For Stage-2 Driving Fine-tuning and Stage-3 Planning Fine-tuning (full-parameter fine-tuning):
sh train_tools/train_senna_llava.sh
For Stage-2 Driving Fine-tuning and Stage-3 Planning Fine-tuning (LoRA fine-tuning):
sh train_tools/train_senna_llava_lora.sh
In our experiments, we observed that full-parameter fine-tuning outperforms LoRA fine-tuning. Therefore, we recommend using full-parameter fine-tuning. However, if your machine has limited GPU memory (e.g., only 24GB), you may consider using LoRA fine-tuning as an alternative.
Evaluation
You can evaluate the accuracy of Senna meta-action planning using the script below.
sh eval_tools/senna_plan_cmd_eval_multi_img.sh
Visualization
By running the visualization script below, you can overlay the predicted meta-actions and front-view scene descriptions onto the front-view image and save the results to the specified path.
sh eval_tools/senna_plan_visualization.sh
Qualitative Results
<div align="center"> <img src="./assets/vis.png"> </div>Acknowledgments
LLaVA, the codebase we built upon, we sincerely thank the contributors for their great work!
Citation
If you find Senna useful in your research or applications, please consider giving us a star 🌟 and citing it by the following BibTeX entry.
@article{jiang2024senna,
title={Senna: Bridging Large Vision-Language Models and End-to-End Autonomous Driving},
author={Bo Jiang and Shaoyu Chen and Bencheng Liao and Xingyu Zhang and Wei Yin and Qian Zhang and Chang Huang and Wenyu Liu and Xinggang Wang},
year={2024},
eprint={2410.22313},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2410.22313},
}