Home

Awesome

GMD: Guided Motion Diffusion for Controllable Human Motion Synthesis

arXiv

The official PyTorch implementation of the paper "GMD: Controllable Human Motion Synthesis via Guided Diffusion Models".

For more details, visit our project page.

teaser

News

📢 20/Dec/23 - We release DNO: Optimizing Diffusion Noise Can Serve As Universal Motion Priors, a follow-up work that looks at how to effectively use diffusion model and guidance to tackle many motion tasks.
28/July/23 - First release.

Bibtex

If you find this code useful in your research, please cite:

@inproceedings{karunratanakul2023gmd,
  title     = {Guided Motion Diffusion for Controllable Human Motion Synthesis},
  author    = {Karunratanakul, Korrawe and Preechakul, Konpat and Suwajanakorn, Supasorn and Tang, Siyu},
  booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages     = {2151--2162},
  year      = {2023}
}

Getting started

This code was tested on Ubuntu 20.04 LTS and requires:

1. Setup environment

Install ffmpeg (if not already installed):

sudo apt update
sudo apt install ffmpeg

For windows use this instead.

2. Install dependencies

GMD shares a large part of its base dependencies with the MDM. However, you might find it easier to install our dependencies from scratch due to some key version differences.

Setup conda env:

conda env create -f environment_gmd.yml
conda activate gmd
conda remove --force ffmpeg
python -m spacy download en_core_web_sm
pip install git+https://github.com/openai/CLIP.git

Download dependencies:

<details> <summary><b>Text to Motion</b></summary>
bash prepare/download_smpl_files.sh
bash prepare/download_glove.sh
bash prepare/download_t2m_evaluators.sh
</details> <details> <summary><b>Unconstrained</b></summary>
bash prepare/download_smpl_files.sh
bash prepare/download_recognition_unconstrained_models.sh
</details>

2. Get data

<!-- <details> <summary><b>Text to Motion</b></summary> -->

There are two paths to get the data:

(a) Generation only wtih pretrained text-to-motion model without training or evaluating

(b) Get full data to train and evaluate the model.

a. Generation only (text only)

HumanML3D - Clone HumanML3D, then copy the data dir to our repository:

cd ..
git clone https://github.com/EricGuo5513/HumanML3D.git
unzip ./HumanML3D/HumanML3D/texts.zip -d ./HumanML3D/HumanML3D/
cp -r HumanML3D/HumanML3D guided-motion-diffusion/dataset/HumanML3D
cd guided-motion-diffusion
cp -a dataset/HumanML3D_abs/. dataset/HumanML3D/

b. Full data (text + motion capture)

[Important !] Because we change the representation of the root joint from relative to absolute, you need to replace the original files and run our version of motion_representation.ipynb and cal_mean_variance.ipynb provided in ./HumanML3D_abs/ instead to get the absolute-root data.

HumanML3D - Follow the instructions in HumanML3D, then copy the result dataset to our repository:

Then copy the data to our repository

cp -r ../HumanML3D/HumanML3D ./dataset/HumanML3D

3. Download the pretrained models

Download both models, then unzip and place them in ./save/.

Both models are trained on the HumanML3D dataset.

trajectory model

motion model

Motion Synthesis

<details> <summary><b>Text to Motion - <u>Without</u> spatial conditioning</b></summary>

This part is a standard text-to-motion generation.

Generate from test set prompts

Note: We change the behavior of the --num_repetitions flag from the original MDM repo to facilitate the two-staged pipeline and logging. We only support --num_repetitions 1 at this moment.

python -m sample.generate --model_path ./save/unet_adazero_xl_x0_abs_proj10_fp16_clipwd_224/model000500000.pt --num_samples 10

Generate from your text file

python -m sample.generate --model_path ./save/unet_adazero_xl_x0_abs_proj10_fp16_clipwd_224/model000500000.pt --input_text ./assets/example_text_prompts.txt

Generate from a single prompt - no spatial guidance

python -m sample.generate --model_path ./save/unet_adazero_xl_x0_abs_proj10_fp16_clipwd_224/model000500000.pt --text_prompt "a person is picking up something on the floor"

example

</details> <details> <summary><b>Text to Motion - <u>With</u> keyframe locations conditioning</b></summary>

Generate from a single prompt - condition on keyframe locations

The predefined pattern can be found in get_kframes() in sample/keyframe_pattern.py. You can add more patterns there using the same format [(frame_num_1, (x_1, z_1)), (frame_num_2, (x_2, z_2)), ...] where x and z are the location of the root joint on the plane in the world coordinate system.

python -m sample.generate --model_path ./save/unet_adazero_xl_x0_abs_proj10_fp16_clipwd_224/model000500000.pt --text_prompt "a person is walking while raising both hands" --guidance_mode kps

example

(In development) Using the --interactive flag will start an interactive window that allows you to choose the keyframes yourself. The interactive pattern will override the predefined pattern.

</details> <details> <summary><b>Text to Motion - <u>With</u> keyframe locations conditioning <u>and</u> obstacle avoidance</b></summary>

Similarly, the pattern is defined in get_obstacles() in sample/keyframe_pattern.py. You can add more patterns using the format ((x, z), radius) currently we only support circle obstacle due to the ease of defining SDF, but you can add any shape with valid SDF.

python -m sample.generate --model_path ./save/unet_adazero_xl_x0_abs_proj10_fp16_clipwd_224/model000500000.pt --text_prompt "a person is walking while raising both hands" --guidance_mode sdf --seed 11

example

</details> <details> <summary><b>Text to Motion - <u>With</u> trajectory conditioning</b></summary>

Generate from a single prompt - condition on a trajectory

The trajectory-conditioned generation is a special case of keyframe-conditioned generation, where all the frames are keyframes. The sample trajectory we used can be found in ./save/template_joints.npy. You can also use your own trajectory by providing the list of ground_positions.

python -m sample.generate --model_path ./save/unet_adazero_xl_x0_abs_proj10_fp16_clipwd_224/model000500000.pt --text_prompt "a person is walking while raising both hands" --guidance_mode trajectory

example

(In development) Using the --interactive flag will start an interactive window that allows you to draw a trajectory that will override the predefined pattern.

</details>

You may also define:

Running those will get you:

You can stop here, or render the SMPL mesh using the following script.

Render SMPL mesh

To create SMPL mesh per frame run:

python -m visualize.render_mesh --input_path /path/to/mp4/stick/figure/file

This script outputs:

Notes:

Notes for 3d makers:

Training GMD

GMD is trained on the HumanML3D dataset.

Trajectory Model

python -m train.train_trajectory

Motion Model

python -m train.train_gmd

Essentially, the same command is used for both the trajectory model and the motion model. You can select which model to train by changing the train_args. The training options can be found in ./configs/card.py.

Evaluate

All evaluation are done on the HumanML3D dataset.

Text to Motion

python -m eval.eval_humanml --model_path ./save/unet_adazero_xl_x0_abs_proj10_fp16_clipwd_224/model000500000.pt

Text to Motion - <u>With</u> trajectory conditioning

For each prompt, we use the ground truth trajectory as conditions.

python -m eval.eval_humanml --model_path ./save/unet_adazero_xl_x0_abs_proj10_fp16_clipwd_224/model000500000.pt --full_traj_inpaint

Text to Motion - <u>With</u> keyframe locations conditioning

For each prompt, 5 keyframes are sampled from the ground truth motion. The ground locations of the root joint in those frames are used as conditions.

python -m eval.eval_humanml_condition --model_path ./save/unet_adazero_xl_x0_abs_proj10_fp16_clipwd_224/model000500000.pt

Acknowledgments

We would like to thank the following contributors for the great foundation that we build upon:

MDM, guided-diffusion, MotionCLIP, text-to-motion, actor, joints2smpl, MoDi.

License

This code is distributed under an MIT LICENSE.

Note that our code depends on other libraries, including CLIP, SMPL, SMPL-X, PyTorch3D, and uses datasets that each have their own respective licenses that must also be followed.