Home

Awesome

<div align="center"> <img src="./assets/logo.png" width="30%"> </div> <h1 align="center">πŸ’ƒDreamWaltz: Make a Scene with Complex 3D Animatable Avatars</h1> <p align="center">

Project Page | Paper | arXiv | Poster

This repository contains the official implementation of NeurIPS 2023 paper:

DreamWaltz: Make a Scene with Complex 3D Animatable Avatars <br>Yukun Huang<sup>1,2</sup>, Jianan Wang<sup>1</sup>, Ailing Zeng<sup>1</sup>, He Cao<sup>1</sup>, Xianbiao Qi<sup>1</sup>, Yukai Shi<sup>1</sup>, Zheng-Jun Zha<sup>2</sup>, Lei Zhang<sup>1</sup><br> <sup>1</sup>International Digital Economy Academy Β  <sup>2</sup>University of Science and Technology of China

News

Introduction

DreamWaltz is a learning framework for text-driven 3D animatable avatar creation using pretrained 2D diffusion model ControlNet and human parametric model SMPL. The core idea is to optimize a deformable NeRF representation from skeleton-conditioned diffusion supervisions, which ensures 3D consistency and generalization to arbitrary poses.

<p align="middle"> <img src="assets/teaser.gif" width="80%"> <br> <em>Figure 1. DreamWaltz can generate animatable avatars (a) and construct complex scenes (b)(c)(d).</em> </p>

Installation

This code is heavily based on the excellent latent-nerf and stable-dreamfusion projects. Please install dependencies:

pip install -r requirements.txt

The cuda extension for instant-ngp is built at runtime as in stable-dreamfusion.

Prepare SMPL Weights

We use smpl and vposer models for avatar creation and animation learning, please follow the instructions in smplx and human_body_prior to download the model weights, and build a directory with the following structure:

smpl_models
β”œβ”€β”€ smpl
β”‚Β Β  β”œβ”€β”€ SMPL_FEMALE.pkl
β”‚Β Β  └── SMPL_MALE.pkl
β”‚Β Β  └── SMPL_NEUTRAL.pkl
└── vposer
 Β Β  └── v2.0
        β”œβ”€β”€ snapshots
        β”œβ”€β”€ V02_05.yaml
        └── V02_05.log

Then, update the model paths SMPL_ROOT and VPOSER_ROOT in configs/paths.py.

Prepare Motion Sequences

You might need to prepare SMPL-format human motion sequences to animate the generated avatars. Our code provide a data api for AIST++, which is a high-quality dance video database with SMPL annotations. Please download the SMPL annotations from this website, build a directory with the following structure:

aist
β”œβ”€β”€ gWA_sFM_cAll_d26_mWA5_ch13.pkl
β”œβ”€β”€ gWA_sFM_cAll_d27_mWA0_ch15.pkl
β”œβ”€β”€ gWA_sFM_cAll_d27_mWA2_ch17.pkl
└── ...

and update the data path AIST_ROOT in configs/paths.py.

Getting Started

DreamWaltz mainly consists of two training stages: (I) Canonical Avatar Creation and (II) Animatable Avatar Learning.

The following commands are also provided in run.sh.

1. SMPL-Guided NeRF Initialization

To pretrain NeRF using mask images rendered from canonical-posed SMPL mesh:

python train.py \
  --log.exp_name "pretrained" \
  --log.pretrain_only True \
  --prompt.scene canonical-A \
  --prompt.smpl_prompt depth \
  --optim.iters 10000

The obtained pretrained ckpt is available to different text prompts.

2. Canonical Avatar Creation

To learn a NeRF-based canonical avatar representation using ControlNet-based SDS:

text="a wooden robot"
avatar_name="wooden_robot"
pretrained_ckpt="./outputs/pretrained/checkpoints/step_010000.pth"
# the pretrained ckpt is available to different text prompts

python train.py \
  --guide.text "${text}" \
  --log.exp_name "canonical/${avatar_name}" \
  --optim.ckpt "${pretrained_ckpt}" \
  --optim.iters 30000 \
  --prompt.scene canonical-A

3. Animatable Avatar Learning

To learn a NeRF-based animatable avatar representation using ControlNet-based SDS:

text="a wooden robot"
avatar_name="wooden_robot"
canonical_ckpt="./outputs/canonical/${avatar_name}/checkpoints/step_030000.pth"

python train.py \
  --animation True \
  --guide.text "${text}" \
  --log.exp_name "animatable/${avatar_name}" \
  --optim.ckpt "${canonical_ckpt}" \
  --optim.iters 50000 \
  --prompt.scene random \
  --render.cuda_ray False

4. Make a Dancing Video

To make a dancing video based on the well-trained animatable avatar representation and the target motion sequences:

scene="gWA_sFM_cAll_d27_mWA2_ch17,180-280"
# "gWA_sFM_cAll_d27_mWA2_ch17" is the filename of motion sequences in AIST++
# "180-280" is the range of video frame indices: [180, 280]

avatar_name="wooden_robot"
animatable_ckpt="./outputs/animatable/${avatar_name}/checkpoints/step_050000.pth"

python train.py \
    --animation True \
    --log.eval_only True \
    --log.exp_name "videos/${avatar_name}" \
    --optim.ckpt "${animatable_ckpt}" \
    --prompt.scene "${scene}" \
    --render.cuda_ray False \
    --render.eval_fix_camera True

The resulting video can be found in PROJECT_ROOT/outputs/videos/${avatar_name}/results/128x128/.

Results

Canonical Avatars

<p align="middle"> <image src="assets/canonical_half.gif" width="80%"> <br> <em>Figure 2. DreamWaltz can create canonical avatars from textual descriptions.</em> </p>

Animatable Avatars

<p align="middle"> <image src="assets/animation_sp.gif" width="80%"> <br> <em>Figure 3. DreamWaltz can animate canonical avatars given motion sequences.</em> </p>

Complex Scenes

<p align="middle"> <image src="assets/animation_obj.gif" width="80%"> <br> <em>Figure 4. DreamWaltz can make complex 3D scenes with avatar-object interactions.</em> </p> <p align="middle"> <image src="assets/animation_scene.gif" width="80%"> <br> <em>Figure 5. DreamWaltz can make complex 3D scenes with avatar-scene interactions.</em> </p> <p align="middle"> <image src="assets/animation_mp.gif" width="80%"> <br> <em>Figure 6. DreamWaltz can make complex 3D scenes with avatar-avatar interactions.</em> </p>

Reference

If you find this repository useful for your work, please consider citing it as follows:

@inproceedings{huang2023dreamwaltz,
  title={{DreamWaltz: Make a Scene with Complex 3D Animatable Avatars}},
  author={Yukun Huang and Jianan Wang and Ailing Zeng and He Cao and Xianbiao Qi and Yukai Shi and Zheng-Jun Zha and Lei Zhang},
  booktitle={Advances in Neural Information Processing Systems},
  year={2023}
}

@inproceedings{huang2024dreamtime,
  title={{DreamTime: An Improved Optimization Strategy for Diffusion-Guided 3D Generation}},
  author={Yukun Huang and Jianan Wang and Yukai Shi and Boshi Tang and Xianbiao Qi and Lei Zhang},
  booktitle={International Conference on Learning Representations},
  year={2024}
}