Awesome
<div align="center">DriveDreamer-2: LLM-Enhanced World Models for Diverse Driving Video Generation
</div>Our team is actively working towards releasing the code for this project.
We appreciate your patience and understanding as we navigate the necessary processes.
Our new works, DriveDreamer4D and ReconDreamer, are released!
Project Page | Paper
Abstract
World models have demonstrated superiority in autonomous driving, particularly in the generation of multi-view driving videos. However, significant challenges still exist in generating customized driving videos. In this paper, we propose DriveDreamer-2, which builds upon the framework of DriveDreamer and incorporates a Large Language Model (LLM) to generate user-defined driving videos. Specifically, an LLM interface is initially incorporated to convert a user's query into agent trajectories. Subsequently, a HDMap, adhering to traffic regulations, is generated based on the trajectories. Ultimately, we propose the Unified Multi-View Model to enhance temporal and spatial coherence in the generated driving videos. DriveDreamer-2 is the first world model to generate customized driving videos, it can generate uncommon driving videos (e.g., vehicles abruptly cut in) in a user-friendly manner. Besides, experimental results demonstrate that the generated videos enhance the training of driving perception methods (e.g., 3D detection and tracking). Furthermore, video generation quality of DriveDreamer-2 surpasses other state-of-the-art methods, showcasing FID and FVD scores of 11.2 and 55.7, representing relative improvements of 30% and 50%.
<img width="919" alt="abs" src="https://github.com/f1yfisher/DriveDreamer2/assets/39218234/e23cf401-5943-4fb3-b0ed-7d183a9df5cd"> <img width="1327" alt="abs2" src="https://github.com/f1yfisher/DriveDreamer2/assets/39218234/edc11963-0443-4e3f-8309-8955330b4815">News
- [2024/12/18] 🚀 Inference code and model weight for video generation are realsed!
- [2024/12/10] 🎉 DriveDreamer-2 is accepted for AAAI'25!.
- [2024/03/11] 🚀 We release the DriveDreamer-2 project! (Key features: multi-view video generation, user-friendly with LLM)
Getting Started
Download model weights and preprocessing file HERE.
Demo
Results with Gnerated Structural Information
Daytime / rainy day / at night, a car abruptly cutting in from the right rear of ego-car.
<div align="center">https://github.com/f1yfisher/DriveDreamer2/assets/39218234/0df78173-9dcd-42f4-8cf8-f7e16b724f82
</div>Rainy day, car abruptly cutting in from the left rear of ego-car. (long video)
<div align="center">https://github.com/f1yfisher/DriveDreamer2/assets/39218234/779fa0ad-595a-47f3-a52c-1c98c30fa640
</div>Daytime, the ego-car changes lanes to the right side. (long video)
<div align="center">https://github.com/f1yfisher/DriveDreamer2/assets/39218234/36c0f9e6-b9d1-4bd1-ab5c-f2c28eb3294c
</div>Rainy day, a person crosses the road in the front of the ego-car. (long video)
<div align="center">https://github.com/f1yfisher/DriveDreamer2/assets/39218234/92f8cd31-a1b3-4516-ad03-331cf1ba4acb
</div>Results with nuScenes Structural Information
Daytime / rainy day / at night, ego-car drives through urban street, surrounded by a flow of vehicles on both sides.
<div align="center">https://github.com/f1yfisher/DriveDreamer2/assets/39218234/543656a4-729d-4b2c-b12d-6e75b3068669
</div>Daytime / rainy day / at night, a bus is positioned to the left front of the ego-car, with a pedestrian near the bus.
<div align="center">https://github.com/f1yfisher/DriveDreamer2/assets/39218234/e43193ec-fb91-49ee-818c-b7a2c1a00909
</div>Rainy day, the windshield wipers of the truck are continuously clearing the windshield.
<div align="center">https://github.com/f1yfisher/DriveDreamer2/assets/39218234/d05c2ab9-5c41-4dd3-bbd2-7a69b049b891
</div>Rainy day, the ego-car makes a left turn at the traffic signal, with vehicles behind proceeding straight through the intersection. (long video)
<div align="center">https://github.com/f1yfisher/DriveDreamer2/assets/39218234/a766b12b-05a3-4755-858e-040c8bbf6ece
</div>Daytime, the ego-car drives straight through the traffic light, with a truck situated to the left front and pedestrians crossing on the right side. (long video)
<div align="center">https://github.com/f1yfisher/DriveDreamer2/assets/39218234/e5f713dc-665f-49e2-8f70-3c5de101ffb4
</div>DriveDreamer-2 Framework
<img width="1277" alt="method" src="https://github.com/f1yfisher/DriveDreamer2/assets/39218234/bbb8d658-793a-4b3c-b873-ea5332f7ec4b">Bibtex
If this work is helpful for your research, please consider citing the following BibTeX entry.
@article{zhao2024drive,
title={DriveDreamer-2: LLM-Enhanced World Models for Diverse Driving Video Generation},
author={Zhao, Guosheng and Wang, Xiaofeng and Zhu, Zheng and Chen, Xinze and Huang, Guan and Bao, Xiaoyi and Wang, Xingang},
journal={arXiv preprint arXiv:2403.06845},
year={2024}
}