Home

Awesome

OmniControl: Control Any Joint at Any Time for Human Motion Generation

Project Page | Paper

OmniControl: Control Any Joint at Any Time for Human Motion Generation
Yiming Xie, Varun Jampani, Lei Zhong, Deqing Sun, Huaizu Jiang

teaser

Citation

@inproceedings{
xie2024omnicontrol,
title={OmniControl: Control Any Joint at Any Time for Human Motion Generation},
author={Yiming Xie and Varun Jampani and Lei Zhong and Deqing Sun and Huaizu Jiang},
booktitle={The Twelfth International Conference on Learning Representations},
year={2024},
url={https://openreview.net/forum?id=gd0lAEtWso}
}

News

📢 10/Dec/23 - First release

TODO List

<!-- - [ ] Interactive demo. -->

Getting started

This code requires:

1. Setup environment

Install ffmpeg (if not already installed):

sudo apt update
sudo apt install ffmpeg

For windows use this instead.

Setup conda env:

conda env create -f environment.yml
conda activate omnicontrol
python -m spacy download en_core_web_sm
pip install git+https://github.com/openai/CLIP.git

Download dependencies:

bash prepare/download_smpl_files.sh
bash prepare/download_glove.sh
bash prepare/download_t2m_evaluators.sh

2. Get data

Full data (text + motion capture)

HumanML3D - Follow the instructions in HumanML3D, then copy the result dataset to our repository:

cp -r ../HumanML3D/HumanML3D ./dataset/HumanML3D

KIT - Download from HumanML3D (no processing needed this time) and the place result in ./dataset/KIT-ML

3. Download the pretrained models

Download the model(s) you wish to use, then unzip and place them in ./save/.

HumanML3D

model_humanml

cd save
gdown --id 1oTkBtArc3xjqkYD6Id7LksrTOn3e1Zud
unzip omnicontrol_ckpt.zip -d .
cd ..

Motion Synthesis

Generate with the manually defined spatial control signals and texts

Check the manually defined spatial control signals in text_control_example. You can define your own inputs following this file.

python -m sample.generate --model_path ./save/omnicontrol_ckpt/model_humanml3d.pt --num_repetitions 1

Generate with the spatial control signals and text sampled from the HumanML3D dataset

We randomly sample spatial control signals from the ground-truth motions of HumanML3D dataset.

python -m sample.generate --model_path ./save/omnicontrol_ckpt/model_humanml3d.pt --num_repetitions 1 --text_prompt ''

You may also define:

Running those will get you:

It will look something like this:

example

You can stop here, or render the SMPL mesh using the following script.

Render SMPL mesh

This part is directly borrowed from MDM.
To create SMPL mesh per frame run:

python -m visualize.render_mesh --input_path /path/to/mp4/stick/figure/file

This script outputs:

Notes:

Notes for 3d makers:

Train your own OmniControl

HumanML3D
Download the pretrained MDM model. The model is from MDM. Then place it in ./save/. Or you can download the pretrained model via

cd save
gdown --id 1XS_kp1JszAxgZBq9SL8Y5JscVVqJ2c7H
cd ..

You can train your own model via

python -m train.train_mdm --save_dir save/my_omnicontrol --dataset humanml --num_steps 400000 --batch_size 64 --resume_checkpoint ./save/model000475000.pt --lr 1e-5

Evaluate

HumanML3D

./eval_omnicontrol_all.sh ./save/omnicontrol_ckpt/model_humanml3d.pt 

Or you can evaluate each setting separately, e.g., root joint (0) with dense spatial control signal (100).
It takes about 1.5 hours.

./eval_omnicontrol.sh ./save/omnicontrol_ckpt/model_humanml3d.pt 0 100

Code pointer to the main module of OmniControl

Spatial Guidance. (./diffusion/gaussian_diffusion.py#L450)
Realism Guidance. (./model/cmdm.py#L158)

Acknowledgments

Our code is based on MDM.
The motion visualization is based on MLD and TMOS. We also thank the following works: guided-diffusion, MotionCLIP, text-to-motion, actor, joints2smpl, MoDi.

License

This code is distributed under an MIT LICENSE.
Note that our code depends on other libraries, including CLIP, SMPL, SMPL-X, PyTorch3D, and uses datasets that each have their own respective licenses that must also be followed.