Awesome
Panacea: Panoramic and Controllable Video Generation for Autonomous Driving
Official Repository of Panacea.
[Paper] Panacea: Panoramic and Controllable Video Generation for Autonomous Driving,
Yuqing Wen<sup>1*†</sup>, Yucheng Zhao<sup>2*</sup>,Yingfei Liu<sup>2*</sup>, Fan Jia<sup>2</sup>, Yanhui Wang<sup>1</sup>, Chong Luo<sup>1</sup>, Chi Zhang<sup>3</sup>, Tiancai Wang<sup>2‡</sup>, Xiaoyan Sun<sup>1‡</sup>, Xiangyu Zhang<sup>2</sup> <br> <sup>1</sup>University of Science and Technology of China, <sup>2</sup>MEGVII Technology, <sup>3</sup>Mach Drive <br> <sup>*</sup>Equal Contribution, <sup>†</sup>This work was done during the internship at MEGVII, <sup>‡</sup>Corresponding Author.
[Paper] Panacea+: Panoramic and Controllable Video Generation for Autonomous Driving,
Yuqing Wen<sup>1*†</sup>, Yucheng Zhao<sup>2*</sup>,Yingfei Liu<sup>2*</sup>, Binyuan Huang<sup>4*</sup>, Fan Jia<sup>2</sup>, Yanhui Wang<sup>1</sup>, Chi Zhang<sup>3</sup>, Tiancai Wang<sup>2‡</sup>, Xiaoyan Sun<sup>1‡</sup>, Xiangyu Zhang<sup>2</sup> <br> <sup>1</sup>University of Science and Technology of China, <sup>2</sup>MEGVII Technology, <sup>3</sup>Mach Drive, <sup>4</sup>Wuhan University <br> <sup>*</sup>Equal Contribution, <sup>†</sup>This work was done during the internship at MEGVII, <sup>‡</sup>Corresponding Author.
[WebPage] https://panacea-ad.github.io/
News
-
Aug. 15th, 2024
: We release an enhanced version of Panacea, named Panancea+, which has improved performance and comprehensive validation on multiple datasets and tasks. For more details, please refer to the paper Panacea+. -
Aug. 15th, 2024
: We release the checkpoint and inference scripts for stage 2 of Panacea+, you can use it to generate multi-view video samples based on BEV layout sequences. -
Apr. 18th, 2024
: We release our Gen-nuScenes dataset generated by Panacea. Please check themetrics/
folder to use it. -
Apr. 18th, 2024
: We release the BEV-perception evaluation codes based on StreamPETR. Please check themetrics/
folder and follow themetrics/README.md
for detailed evaluation.
Getting Started
Please follow our documentation step by step.
Environment Setup
Following the instruction from: Environment Setup.
Prepare dataset
Prepare real dataset following the instruction from Data Preparation.
Remember to put the dataset under the path data/nuscenes
Download pretrained checkpoint
Download the weights of the second stage from panaceaplus_40k_deepspeed.ckpt
Put it to folder checkpoints/
Inference
--split: to specify train or val sets
--use_last_frame=true means use the last frame as conditional image.
Run the following command to inference stage 2 on the whole training/val set of nuscenes.
python -m torch.distributed.launch --nproc_per_node=8 --master_port=1238 inference.py --base configs/inference_nuscenes.yaml --ckptpath --ckpt checkpoints/panaceaplus_40k_deepspeed.ckpt --split train --use_last_frame true --name EXP_NAME --bs 1
<div class="root-content" style="padding-top: 10px; width: 65%;">
<h1 class="section-name">Generating <font style="color: red;">Multi-View and Controllable</font> Videos for Autonoumous Driving</h1>
<img src="assests/pipeline.png" style="margin:auto; right: 0; left: 0; width: 90%; display: inline;">
<p class="section-content-text" style="padding-bottom: 20px;"><strong>Overview of Panacea. </strong>(a). The diffusion training process of Panacea, enabled by a diffusion encoder and decoder with the decomposed 4D attention module. (b). The decomposed 4D attention module comprises three components: intra-view attention for spatial processing within individual views, cross-view attention to engage with adjacent views, and cross-frame attention for temporal processing. (c). Controllable module for the integration of diverse signals. The image conditions are derived from a frozen VAE encoder and combined with diffused noises. The text prompts are processed through a frozen CLIP encoder, while BEV sequences are handled via ControlNet. (d). The details of BEV layout sequences, including projected bounding boxes, object depths, road maps and camera pose.</p>
<img src="assests/pipeline_inference.png" style="margin:auto; right: 0; left: 0; width: 65%; display: inline;">
<p class="section-content-text" style="padding-bottom: 20px;"><strong>The two-stage inference pipeline of Panacea.</strong> Its two-stage process begins by creating multi-view images with BEV layouts, followed by using these images, along with subsequent BEV layouts, to facilitate the generation of following frames.</p>
</div>
<div class="root-content" style="padding-top: 10px; width: 65%; padding-bottom: 10px;">
<h1 class="section-name">🎬 BEV-guided Video Generation 🎬</h1>
<table style="width: 100%;">
<tbody>
<tr class="result-row">
<td>
<img src="assests/demo1.gif">
</td>
</tr>
<tr class="result-row">
<td>
<img src="assests/demo2.gif">
</td>
</tr>
</tbody>
</table>
<p class="section-content-text"><strong>Controllable multi-view video generation. Panacea is able to generate realistic, controllable videos with good temporal and view consistensy.</strong></p>
</div>
<div class="root-content" style="padding-top: 10px;width: 65%; padding-bottom: 10px;">
<h1 class="section-name">🎞 Attribute Controllable Video Generation 🎞</h1>
<table style="width: 100%;">
<tbody>
<tr class="result-row">
<td>
<img src="assests/attribute.png">
</td>
</tr>
</tbody>
</table>
<p class="section-content-text"><strong>Video generation with variable attribute controls, such as weather, time, and scene, which allows Panacea to simulate a variety of rare driving scenarios, including extreme weather conditions such as rain and snow, thereby greatly enhancing the diversity of the data.</strong></p>
</div>
<div class="root-content" style="padding-top: 10px;width: 65%; padding-bottom: 10px;">
<h1 class="section-name">🔥 Benefiting Autonomous Driving 🔥</h1>
<table style="padding-left: 120px;width: 90%;">
<tbody>
<tr class="result-row">
<td>
<img src="assests/gain.png">
</td>
</tr>
</tbody>
</table>
<p class="section-content-text"><strong> (a). Panoramic video generation based on BEV (Bird’s-Eye-View) layout sequence facilitates the establishment of a synthetic video dataset, which enhances perceptual tasks. (b). Producing panoramic videos with conditional images and BEV layouts can effectively elevate image-only datasets to video datasets, thus enabling the advancement of video-based perception techniques.</strong></p>
</div>
<div style="background-color: white; margin-right: auto; margin-left: auto;">
<div class="root-content" style="padding-top: 10px; width: 65%; padding-bottom: 10px;">
<div>
<h1 class="section-name" style="margin-top: 30px; text-align: left; font-size: 25px;">
BibTex
</h1>
<a name="bib"></a>
<pre style="margin-top: 5px;" class="bibtex">
<code>
@inproceedings{wen2024panacea,
title={Panacea: Panoramic and controllable video generation for autonomous driving},
author={Wen, Yuqing and Zhao, Yucheng and Liu, Yingfei and Jia, Fan and Wang, Yanhui and Luo, Chong and Zhang, Chi and Wang, Tiancai and Sun, Xiaoyan and Zhang, Xiangyu},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={6902--6912},
year={2024}
}
@misc{wen2024panaceapanoramiccontrollablevideo,
title={Panacea+: Panoramic and Controllable Video Generation for Autonomous Driving},
author={Yuqing Wen and Yucheng Zhao and Yingfei Liu and Binyuan Huang and Fan Jia and Yanhui Wang and Chi Zhang and Tiancai Wang and Xiaoyan Sun and Xiangyu Zhang},
year={2024},
eprint={2408.07605},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2408.07605},
}
}</code></pre>
</div>
<div style="margin-bottom: 0px;">
<h1 class="section-name" style="margin-top: 0px; margin-bottom: 10px; text-align: left; font-size: 25px;">
Contact
</h1>
<p class="section-content-text">
Feel free to contact us at <strong>wenyuqing AT mail.ustc.edu.cn</strong> or <strong>wangtiancai AT megvii.com</strong>
</div>
</div>
</div>
Acknowledgement
This code builds on Stability-AI, ControlNet and StreamPETR. Thanks for open-sourcing!