Awesome

<div align="center"> <img src="./assets/logo.png" width="30%"> </div> <h1 align="center">💃DreamWaltz: Make a Scene with Complex 3D Animatable Avatars</h1>

Project Page | Paper | arXiv | Poster

This repository contains the official implementation of NeurIPS 2023 paper:

DreamWaltz: Make a Scene with Complex 3D Animatable Avatars Yukun Huang1,2, Jianan Wang1, Ailing Zeng1, He Cao1, Xianbiao Qi1, Yukai Shi1, Zheng-Jun Zha2, Lei Zhang1 1International Digital Economy Academy 2University of Science and Technology of China

News

15/10/2024: We present DreamWaltz-G! An enhanced version of DreamWaltz with hand and expression control.
09/01/2024: Thank Zehuan Huang for the threestudio implementation of DreamWaltz!
11/10/2023: Training and inference codes are released.

Introduction

DreamWaltz is a learning framework for text-driven 3D animatable avatar creation using pretrained 2D diffusion model ControlNet and human parametric model SMPL. The core idea is to optimize a deformable NeRF representation from skeleton-conditioned diffusion supervisions, which ensures 3D consistency and generalization to arbitrary poses.

<img src="assets/teaser.gif" width="80%"> Figure 1. DreamWaltz can generate animatable avatars (a) and construct complex scenes (b)(c)(d).

Installation

This code is heavily based on the excellent latent-nerf and stable-dreamfusion projects. Please install dependencies:

pip install -r requirements.txt

The cuda extension for instant-ngp is built at runtime as in stable-dreamfusion.

Prepare SMPL Weights

We use smpl and vposer models for avatar creation and animation learning, please follow the instructions in smplx and human_body_prior to download the model weights, and build a directory with the following structure:

smpl_models
├── smpl
│   ├── SMPL_FEMALE.pkl
│   └── SMPL_MALE.pkl
│   └── SMPL_NEUTRAL.pkl
└── vposer
    └── v2.0
        ├── snapshots
        ├── V02_05.yaml
        └── V02_05.log

Then, update the model paths SMPL_ROOT and VPOSER_ROOT in configs/paths.py.

Prepare Motion Sequences

You might need to prepare SMPL-format human motion sequences to animate the generated avatars. Our code provide a data api for AIST++, which is a high-quality dance video database with SMPL annotations. Please download the SMPL annotations from this website, build a directory with the following structure:

aist
├── gWA_sFM_cAll_d26_mWA5_ch13.pkl
├── gWA_sFM_cAll_d27_mWA0_ch15.pkl
├── gWA_sFM_cAll_d27_mWA2_ch17.pkl
└── ...

and update the data path AIST_ROOT in configs/paths.py.

Getting Started

DreamWaltz mainly consists of two training stages: (I) Canonical Avatar Creation and (II) Animatable Avatar Learning.

The following commands are also provided in run.sh.

1. SMPL-Guided NeRF Initialization

To pretrain NeRF using mask images rendered from canonical-posed SMPL mesh:

python train.py \
  --log.exp_name "pretrained" \
  --log.pretrain_only True \
  --prompt.scene canonical-A \
  --prompt.smpl_prompt depth \
  --optim.iters 10000

The obtained pretrained ckpt is available to different text prompts.

2. Canonical Avatar Creation

To learn a NeRF-based canonical avatar representation using ControlNet-based SDS:

text="a wooden robot"
avatar_name="wooden_robot"
pretrained_ckpt="./outputs/pretrained/checkpoints/step_010000.pth"
# the pretrained ckpt is available to different text prompts

python train.py \
  --guide.text "${text}" \
  --log.exp_name "canonical/${avatar_name}" \
  --optim.ckpt "${pretrained_ckpt}" \
  --optim.iters 30000 \
  --prompt.scene canonical-A

3. Animatable Avatar Learning

To learn a NeRF-based animatable avatar representation using ControlNet-based SDS:

text="a wooden robot"
avatar_name="wooden_robot"
canonical_ckpt="./outputs/canonical/${avatar_name}/checkpoints/step_030000.pth"

python train.py \
  --animation True \
  --guide.text "${text}" \
  --log.exp_name "animatable/${avatar_name}" \
  --optim.ckpt "${canonical_ckpt}" \
  --optim.iters 50000 \
  --prompt.scene random \
  --render.cuda_ray False

4. Make a Dancing Video

To make a dancing video based on the well-trained animatable avatar representation and the target motion sequences:

scene="gWA_sFM_cAll_d27_mWA2_ch17,180-280"
# "gWA_sFM_cAll_d27_mWA2_ch17" is the filename of motion sequences in AIST++
# "180-280" is the range of video frame indices: [180, 280]

avatar_name="wooden_robot"
animatable_ckpt="./outputs/animatable/${avatar_name}/checkpoints/step_050000.pth"

python train.py \
    --animation True \
    --log.eval_only True \
    --log.exp_name "videos/${avatar_name}" \
    --optim.ckpt "${animatable_ckpt}" \
    --prompt.scene "${scene}" \
    --render.cuda_ray False \
    --render.eval_fix_camera True

The resulting video can be found in PROJECT_ROOT/outputs/videos/${avatar_name}/results/128x128/.

Results

Canonical Avatars

<image src="assets/canonical_half.gif" width="80%"> Figure 2. DreamWaltz can create canonical avatars from textual descriptions.

Animatable Avatars

<image src="assets/animation_sp.gif" width="80%"> Figure 3. DreamWaltz can animate canonical avatars given motion sequences.

Complex Scenes

<image src="assets/animation_obj.gif" width="80%"> Figure 4. DreamWaltz can make complex 3D scenes with avatar-object interactions. <image src="assets/animation_scene.gif" width="80%"> Figure 5. DreamWaltz can make complex 3D scenes with avatar-scene interactions. <image src="assets/animation_mp.gif" width="80%"> Figure 6. DreamWaltz can make complex 3D scenes with avatar-avatar interactions.

Reference

If you find this repository useful for your work, please consider citing it as follows:

@inproceedings{huang2023dreamwaltz,
  title={{DreamWaltz: Make a Scene with Complex 3D Animatable Avatars}},
  author={Yukun Huang and Jianan Wang and Ailing Zeng and He Cao and Xianbiao Qi and Yukai Shi and Zheng-Jun Zha and Lei Zhang},
  booktitle={Advances in Neural Information Processing Systems},
  year={2023}
}

@inproceedings{huang2024dreamtime,
  title={{DreamTime: An Improved Optimization Strategy for Diffusion-Guided 3D Generation}},
  author={Yukun Huang and Jianan Wang and Yukai Shi and Boshi Tang and Xianbiao Qi and Lei Zhang},
  booktitle={International Conference on Learning Representations},
  year={2024}
}