Awesome
<div align="center"> <img src="./assets/logo.png" width="30%"> </div> <h1 align="center">πDreamWaltz: Make a Scene with Complex 3D Animatable Avatars</h1> <p align="center">Project Page | Paper | arXiv | Poster
This repository contains the official implementation of NeurIPS 2023 paper:
DreamWaltz: Make a Scene with Complex 3D Animatable Avatars <br>Yukun Huang<sup>1,2</sup>, Jianan Wang<sup>1</sup>, Ailing Zeng<sup>1</sup>, He Cao<sup>1</sup>, Xianbiao Qi<sup>1</sup>, Yukai Shi<sup>1</sup>, Zheng-Jun Zha<sup>2</sup>, Lei Zhang<sup>1</sup><br> <sup>1</sup>International Digital Economy Academy Β <sup>2</sup>University of Science and Technology of China
News
- 15/10/2024: We present DreamWaltz-G! An enhanced version of DreamWaltz with hand and expression control.
- 09/01/2024: Thank Zehuan Huang for the threestudio implementation of DreamWaltz!
- 11/10/2023: Training and inference codes are released.
Introduction
DreamWaltz is a learning framework for text-driven 3D animatable avatar creation using pretrained 2D diffusion model ControlNet and human parametric model SMPL. The core idea is to optimize a deformable NeRF representation from skeleton-conditioned diffusion supervisions, which ensures 3D consistency and generalization to arbitrary poses.
<p align="middle"> <img src="assets/teaser.gif" width="80%"> <br> <em>Figure 1. DreamWaltz can generate animatable avatars (a) and construct complex scenes (b)(c)(d).</em> </p>Installation
This code is heavily based on the excellent latent-nerf and stable-dreamfusion projects. Please install dependencies:
pip install -r requirements.txt
The cuda extension for instant-ngp is built at runtime as in stable-dreamfusion.
Prepare SMPL Weights
We use smpl and vposer models for avatar creation and animation learning, please follow the instructions in smplx and human_body_prior to download the model weights, and build a directory with the following structure:
smpl_models
βββ smpl
βΒ Β βββ SMPL_FEMALE.pkl
βΒ Β βββ SMPL_MALE.pkl
βΒ Β βββ SMPL_NEUTRAL.pkl
βββ vposer
Β Β βββ v2.0
βββ snapshots
βββ V02_05.yaml
βββ V02_05.log
Then, update the model paths SMPL_ROOT
and VPOSER_ROOT
in configs/paths.py
.
Prepare Motion Sequences
You might need to prepare SMPL-format human motion sequences to animate the generated avatars. Our code provide a data api for AIST++, which is a high-quality dance video database with SMPL annotations. Please download the SMPL annotations from this website, build a directory with the following structure:
aist
βββ gWA_sFM_cAll_d26_mWA5_ch13.pkl
βββ gWA_sFM_cAll_d27_mWA0_ch15.pkl
βββ gWA_sFM_cAll_d27_mWA2_ch17.pkl
βββ ...
and update the data path AIST_ROOT
in configs/paths.py
.
Getting Started
DreamWaltz mainly consists of two training stages: (I) Canonical Avatar Creation and (II) Animatable Avatar Learning.
The following commands are also provided in run.sh
.
1. SMPL-Guided NeRF Initialization
To pretrain NeRF using mask images rendered from canonical-posed SMPL mesh:
python train.py \
--log.exp_name "pretrained" \
--log.pretrain_only True \
--prompt.scene canonical-A \
--prompt.smpl_prompt depth \
--optim.iters 10000
The obtained pretrained ckpt is available to different text prompts.
2. Canonical Avatar Creation
To learn a NeRF-based canonical avatar representation using ControlNet-based SDS:
text="a wooden robot"
avatar_name="wooden_robot"
pretrained_ckpt="./outputs/pretrained/checkpoints/step_010000.pth"
# the pretrained ckpt is available to different text prompts
python train.py \
--guide.text "${text}" \
--log.exp_name "canonical/${avatar_name}" \
--optim.ckpt "${pretrained_ckpt}" \
--optim.iters 30000 \
--prompt.scene canonical-A
3. Animatable Avatar Learning
To learn a NeRF-based animatable avatar representation using ControlNet-based SDS:
text="a wooden robot"
avatar_name="wooden_robot"
canonical_ckpt="./outputs/canonical/${avatar_name}/checkpoints/step_030000.pth"
python train.py \
--animation True \
--guide.text "${text}" \
--log.exp_name "animatable/${avatar_name}" \
--optim.ckpt "${canonical_ckpt}" \
--optim.iters 50000 \
--prompt.scene random \
--render.cuda_ray False
4. Make a Dancing Video
To make a dancing video based on the well-trained animatable avatar representation and the target motion sequences:
scene="gWA_sFM_cAll_d27_mWA2_ch17,180-280"
# "gWA_sFM_cAll_d27_mWA2_ch17" is the filename of motion sequences in AIST++
# "180-280" is the range of video frame indices: [180, 280]
avatar_name="wooden_robot"
animatable_ckpt="./outputs/animatable/${avatar_name}/checkpoints/step_050000.pth"
python train.py \
--animation True \
--log.eval_only True \
--log.exp_name "videos/${avatar_name}" \
--optim.ckpt "${animatable_ckpt}" \
--prompt.scene "${scene}" \
--render.cuda_ray False \
--render.eval_fix_camera True
The resulting video can be found in PROJECT_ROOT/outputs/videos/${avatar_name}/results/128x128/
.
Results
Canonical Avatars
<p align="middle"> <image src="assets/canonical_half.gif" width="80%"> <br> <em>Figure 2. DreamWaltz can create canonical avatars from textual descriptions.</em> </p>Animatable Avatars
<p align="middle"> <image src="assets/animation_sp.gif" width="80%"> <br> <em>Figure 3. DreamWaltz can animate canonical avatars given motion sequences.</em> </p>Complex Scenes
<p align="middle"> <image src="assets/animation_obj.gif" width="80%"> <br> <em>Figure 4. DreamWaltz can make complex 3D scenes with avatar-object interactions.</em> </p> <p align="middle"> <image src="assets/animation_scene.gif" width="80%"> <br> <em>Figure 5. DreamWaltz can make complex 3D scenes with avatar-scene interactions.</em> </p> <p align="middle"> <image src="assets/animation_mp.gif" width="80%"> <br> <em>Figure 6. DreamWaltz can make complex 3D scenes with avatar-avatar interactions.</em> </p>Reference
If you find this repository useful for your work, please consider citing it as follows:
@inproceedings{huang2023dreamwaltz,
title={{DreamWaltz: Make a Scene with Complex 3D Animatable Avatars}},
author={Yukun Huang and Jianan Wang and Ailing Zeng and He Cao and Xianbiao Qi and Yukai Shi and Zheng-Jun Zha and Lei Zhang},
booktitle={Advances in Neural Information Processing Systems},
year={2023}
}
@inproceedings{huang2024dreamtime,
title={{DreamTime: An Improved Optimization Strategy for Diffusion-Guided 3D Generation}},
author={Yukun Huang and Jianan Wang and Yukai Shi and Boshi Tang and Xianbiao Qi and Lei Zhang},
booktitle={International Conference on Learning Representations},
year={2024}
}