Home

Awesome

Pose2Img

Upper body image synthesis from skeleton(Keypoints). Pose2Img module in the ICCV-2021 paper "Speech Drives Templates: Co-Speech Gesture Synthesis with Learned Templates". [arxiv / github]

This is a modified implementation of Synthesizing Images of Humans in Unseen Poses.

Setup

To install dependencies, run

pip install -r requirements.txt

To run this module, you need two NVIDIA gpus with at least 11 GB respectively. Our code is tested on Ubuntu 18.04LTS with Python3.6.

Demo Dataset and Checkpoint

  1. Unzip and put the data to $ROOT/data/Oliver
  2. Put the pretrained model to $ROOT/ckpt/Oliver/ckpt_final.pth

Train on the Demo dataset

  1. Train Script:
python main.py \
    --name Oliver \
    --config_path configs/yaml/Oliver.yaml \
    --batch_size 1 \
  1. Run Tensorboard for training visualization.
tensorboard --logdir ./log --port={$Port} --bind_all

Demo

Generate a realistic video for Oliver from {keypoints}.npz.

python inference.py \
   --cfg_path cfg/yaml/Oliver.yaml \
   --name demo \
   --npz_path target_pose/Oliver/varying_tmplt.npz \
   --wav_path target_pose/Oliver/varying_tmplt.mp4

Train on the custom dataset

Citation

If you find this code useful for your research, please use the following BibTeX entry.

@inproceedings{qian2021speech,
  title={Speech Drives Templates: Co-Speech Gesture Synthesis with Learned Templates},
  author={Qian, Shenhan and Tu, Zhi and Zhi, YiHao and Liu, Wen and Gao, Shenghua},
  journal={International Conference on Computer Vision (ICCV)},
  year={2021}
}