Home

Awesome

Large Driving Models

logo

While most existing methods focus on adapting driving tasks to pre-trained large language models or vision-language models (Large Models for Autonomous Driving), we design a series of Large Driving Models specifically for autonomous driving.

Large Driving Model Zoo (Literally!)

ModelFunctionTaskCore ContributorCodeRelease DataWhy the name?
Stereo AnythingLarge Stereo ModelStereo-based Depth EstimationXianda Guohttps://github.com/XiandaGuo/OpenStereo2024/11/22Stereo Anything
Stag-1Large Simulation Model4D Photorealistic SimulationLening Wanghttps://github.com/wzzheng/Stag2024/12/9Spatial-Temporal simulAtion for drivinG
Driv3RLarge Reconstruction ModelPose-free Dense ReconstructionXin Feihttps://github.com/Barrybarry-Smith/Driv3R2024/12/10DRIVing 3d Reconstruction
GPD-1Latent World ModelClose-Loop Simulation, Planning, Scene Generation...Zixun Xiehttps://github.com/wzzheng/GPD2024/12/12Generative Pre-training for Driving
Doe-1Large World ModelEnd-to-End Perception, Prediction, Planning...Zetian Xiahttps://github.com/wzzheng/Doe2024/12/13Driving wOrld modEl
DrivingReconLarge Gaussian ModelFeed-Forward 4D Gaussian ReconstructionHao Luhttps://github.com/EnVision-Research/DriveRecon2024/12/13Driving Reconstruction
Owl-1Video Generation ModelEnd-to-End Planning and GenerationYuanhui Huanghttps://github.com/huang-yh/Owl2024/12/13Omni World modeL

Object-Centric Autonomous System

logo

ModelScenarioTaskCore ContributorCodeRelease Data
GaussianFormerOutdoorMulti-View 3D Occupancy PredictionYuanhui Huanghttps://github.com/huang-yh/GaussianFormer2024/5/27
GaussianFormer-2OutdoorMulti-View 3D Occupancy PredictionYuanhui Huanghttps://github.com/huang-yh/GaussianFormer2024/12/6
EmbodiedOccIndoorEmbodied 3D Occupancy PredictionYuqi Wuhttps://github.com/YkiWu/EmbodiedOcc2024/12/6
GaussianWorldOutdoorStreaming 3D Occupancy PredictionSicheng Zuohttps://github.com/zuosc19/GaussianWorld2024/12/16
GaussianADOutdoorEnd-to-End Autonomous DrivingJunjie Wuhttps://github.com/wzzheng/GaussianAD2024/12/16

Demos

Stag-1: Feed-Forward 4D Photorealistic Simulation

Freeze Time

demo

Freeze View

demo

Demo 3: Multi-View

demo

Driv3R: Pose-free Dense Reconstruction

demo

DrivingRecon: Feed-Forward 4D Gaussian Reconstruction

demo

GPD-1: All-in-One Model for Autonomous Driving Simulation

demo

Doe-1: Closed-Loop Autonomous Driving

demo

EmbodeidOcc: Online Embodied 3D Occupancy Prediction

demo

Citations

If you find this project helpful, please consider citing the following papers:

### Stereo Anything
@article{guo2024stereo,
  title={Stereo Anything: Unifying Stereo Matching with Large-Scale Mixed Data},
  author={Guo, Xianda and Zhang, Chenming and Zhang, Youmin and Nie, Dujun and Wang, Ruilin and Zheng, Wenzhao and Poggi, Matteo and Chen, Long},
  journal={arXiv preprint arXiv:2411.14053},
  year={2024}
}

### Stag-1
@article{stag-1,
    title={Stag-1: Towards Realistic 4D Driving Simulation with Video Generation Model},
    author={Wang, Lening and Zheng, Wenzhao and Du, Dalong and Zhang, Yunpeng and Ren, Yilong and Jiang, Han and Cui, Zhiyong and Yu, Haiyang and Zhou, Jie and Lu, Jiwen and Zhang, Shanghang},
    journal={arXiv preprint arXiv:},
    year={2024}
	}

### Driv3R
@article{driv3r,
  title={Driv3R: Learning Dense 4D Reconstruction for Autonomous Driving}, 
  author={Fei, Xin and Zheng, Wenzhao and Duan, Yueqi and Zhan, Wei and Tomizuka, Masayoshi and Keutzer, Kurt and Lu, Jiwen},
  journal={arXiv preprint arXiv:2412.06777},
  year={2024}
}

### GPD-1
  @article{gpd-1,
    title={GPD-1: Generative Pre-training for Driving},
    author={Xie, Zixun and Zuo, Sicheng and Zheng, Wenzhao and Zhang, Yunpeng and Du, Dalong and Zhou, Jie and Lu, Jiwen and Zhang, Shanghang},
    journal={arXiv preprint arXiv:2412.08643},
    year={2024}
}

### Doe-1
@article{doe,
    title={Doe-1: Closed-Loop Autonomous Driving with Large World Model},
    author={Zheng, Wenzhao and Xia, Zetian and Huang, Yuanhui and Zuo, Sicheng and Zhou, Jie and Lu, Jiwen},
    journal={arXiv preprint arXiv:},
    year={2024}
}

### DrivingRecon
@article{Lu2024DrivingRecon,
        title={DrivingRecon: Large 4D Gaussian Reconstruction Model For Autonomous Driving},
        author={Hao LU, Tianshuo XU, Wenzhao ZHENG, Yunpeng ZHANG, Wei ZHAN, Dalong DU, Masayoshi Tomizuka, Kurt Keutzer, Yingcong CHEN},
        journal={arXiv preprint arXiv:2412.09043},
        year={2024}
      }

### Owl-1
@article{owl-1,
    title={Owl-1: Omni World Model for Consistent Long Video Generation}, 
    author={Huang, Yuanhui and Zheng, Wenzhao and Gao, Yuan and Tao, Xin and Wan, Pengfei and Zhang, Di and Zhou, Jie and Lu, Jiwen},
    journal={arXiv preprint arXiv:2412.09600},
    year={2024},
}

### GaussianFormer-1
@article{gaussianformer,
    title={GaussianFormer: Scene as Gaussians for Vision-Based 3D Semantic Occupancy Prediction},
    author={Huang, Yuanhui and Zheng, Wenzhao and Zhang, Yunpeng and Zhou, Jie and Lu, Jiwen},
    journal={arXiv preprint arXiv:2405.17429},
    year={2024}
}

### GaussianFormer-2
@article{gaussianformer-2,
      title={GaussianFormer-2: Probabilistic Gaussian Superposition for Efficient 3D Occupancy Prediction}, 
      author={Yuanhui Huang and Amonnut Thammatadatrakoon and Wenzhao Zheng and Yunpeng Zhang and Dalong Du and Jiwen Lu},
      journal={arXiv preprint arXiv:2412.04384},
      year={2024}
}
	
### EmbodiedOcc
@article{embodiedocc,
      title={EmbodiedOcc: Embodied 3D Occupancy Prediction for Vision-based Online Scene Understanding}, 
      author={Wu, Yuqi and Zheng, Wenzhao and Zuo, Sicheng and Huang, Yuanhui and Zhou, Jie and Lu, Jiwen},
      journal={arXiv preprint arXiv:2412.04380},
      year={2024}
}