Awesome
💃 IDOL: Unified Dual-Modal Latent Diffusion for Human-Centric Joint Video-Depth Generation
<a href='https://yhzhai.github.io/idol/'><img src='https://img.shields.io/badge/Project-Page-Green'></a> <a href='https://arxiv.org/abs/2407.10937'><img src='https://img.shields.io/badge/Paper-arXiv-red'></a>
Yuanhao Zhai<sup>1</sup>, Kevin Lin<sup>2</sup>, Linjie Li<sup>2</sup>, Chung-Ching Lin<sup>2</sup>, Jianfeng Wang<sup>2</sup>, Zhengyuan Yang<sup>2</sup>, David Doermann<sup>1</sup>, Junsong Yuan<sup>1</sup>, Zicheng Liu<sup>3</sup>, Lijuan Wang<sup>2</sup>
<sup>1</sup>State University of New Yort at Buffalo  |  <sup>2</sup>Microsoft  |  <sup>3</sup>Advanced Micro Devices
European Conference on Computer Vision (ECCV) 2024
Â
TL;DR: Our IDOL enables human-centric joint video-depth generation, which could be rendered into realistic 2.5 videos.
All code and checkpoints will be released soon!