Awesome
Stag-1: Towards Realistic 4D Driving Simulation with Video Generation Model
Check out our Large Driving Model Series!
Paper | Project Page
Stag-1: Towards Realistic 4D Driving Simulation with Video Generation Model
Lening Wang* $^{1,2,4}$, Wenzhao Zheng* $\dagger$ $^{2}$, Dalong Du $^{3}$ , Yunpeng Zhang $^{3}$ , Yilong Ren $^{1}$, Han Jiang $^{1}$, Zhiyong Cui $^{1}$, Haiyang Yu $^{1}$, Jie Zhou $^{2}$, Jiwen Lu $^{2}$, Shanghang Zhang $^{4}$
$^1$ Beihang University $^2$ Tsinghua University $^3$ PhiGent Robotics $^4$ Peking University
Contact: leningwang@buaa.edu.cn, wenzhao.zheng@outlook.com
* Equal contribution $\dagger$ Project leader
News
- [2024/12/9] Initial code release.
- [2024/12/9] Paper released on arXiv.
Demo
For a clearer view, please check the Project Page.
4D Driving Simulation:
Demo 1: Freeze Time.
Demo 2: Freeze View.
Demo 3: Multi-View.
Overview
Spatial-Temporal simulAtion for drivinG (Stag-1) enables controllable 4D autonomous driving simulation with spatial-temporal decoupling. Stag-1 can decompose the original spatial-temporal relationships of real-world scenes to enable controllable autonomous driving simulation. This allows for adjustments such as fixing the camera viewpoint while advancing time or translating and rotating space while keeping time stationary. Additionally, Stag-1 maintains synchronized variations across six panoramic views.
Our Stag-1 framework is a 4D generative model for autonomous driving simulation. It reconstructs 4D scenes from point clouds and projects them into continuous, sparse keyframes. A spatial-temporal fusion framework is then used to generate simulation scenarios. Two key design aspects guide our approach: 1) We develop a method for 4D point cloud matching and keyframe reconstruction, ensuring the accurate generation of continuous, sparse keyframes that account for both vehicle motion and the need for spatial-temporal decoupling in simulation. 2) We build a spatial-temporal fusion framework that integrates surround-view information and continuous scene projection to ensure accurate simulation generation.
Getting Started
Installation
-
Create a conda environment with Python version 3.8.18
-
Install all the packages in environment.yaml
Preparing
We prepare the nuScenes dataset similar to MagicDrive.
- Download the nuScenes dataset from the website and put them in
./data/
. You should have these files:data/nuscenes ├── maps ├── mini ├── samples ├── sweeps ├── v1.0-mini └── v1.0-trainval
Training / Fine-tuning
sh configs/run-our/run.sh
Related Projects
Our code is based on ViewCrater and MagicDrive.
Also thanks to these excellent open-sourced repos: Vista and S<sup>3</sup>Gaussian
Citation
If you find this project helpful, please consider citing the following paper:
@article{wang2024stag-1,
title={Stag-1: Towards Realistic 4D Driving Simulation with Video Generation Model},
author={Wang, Lening and Zheng, Wenzhao and Du, Dalong and Zhang, Yunpeng and Ren, Yilong and Jiang, Han and Cui, Zhiyong and Yu, Haiyang and Zhou, Jie and Lu, Jiwen and Zhang, Shanghang},
journal={arXiv preprint arXiv:2412.05280},
year={2024}
}