Awesome
WHAM: Reconstructing World-grounded Humans with Accurate 3D Motion
<a href="https://pytorch.org/get-started/locally/"><img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-ee4c2c?logo=pytorch&logoColor=white"></a> <a href="https://wham.is.tue.mpg.de/"><img alt="Project" src="https://img.shields.io/badge/-Project%20Page-lightgrey?logo=Google%20Chrome&color=informational&logoColor=white"></a>
https://github.com/yohanshin/WHAM/assets/46889727/da4602b4-0597-4e64-8da4-ab06931b23ee
Introduction
This repository is the official Pytorch implementation of WHAM: Reconstructing World-grounded Humans with Accurate 3D Motion. For more information, please visit our project page.
Installation
Please see Installation for details.
Quick Demo
<img src="https://i.imgur.com/QCojoJk.png" width="30"> Google Colab for WHAM demo is now available
Registration
To download SMPL body models (Neutral, Female, and Male), you need to register for SMPL and SMPLify. The username and password for both homepages will be used while fetching the demo data.
Next, run the following script to fetch demo data. This script will download all the required dependencies including trained models and demo videos.
bash fetch_demo_data.sh
You can try with one examplar video:
python demo.py --video examples/IMG_9732.mov --visualize
We assume camera focal length following CLIFF. You can specify known camera intrinsics [fx fy cx cy] for SLAM as the demo example below:
python demo.py --video examples/drone_video.mp4 --calib examples/drone_calib.txt --visualize
You can skip SLAM if you only want to get camera-coordinate motion. You can run as:
python demo.py --video examples/IMG_9732.mov --visualize --estimate_local_only
You can further refine the results of WHAM using Temporal SMPLify as a post processing. This will allow better 2D alignment as well as 3D accuracy. All you need to do is add --run_smplify
flag when running demo.
Docker
Please refer to Docker for details.
Python API
Please refer to API for details.
Dataset
Please see Dataset for details.
Evaluation
# Evaluate on 3DPW dataset
python -m lib.eval.evaluate_3dpw --cfg configs/yamls/demo.yaml TRAIN.CHECKPOINT checkpoints/wham_vit_w_3dpw.pth.tar
# Evaluate on RICH dataset
python -m lib.eval.evaluate_rich --cfg configs/yamls/demo.yaml TRAIN.CHECKPOINT checkpoints/wham_vit_w_3dpw.pth.tar
# Evaluate on EMDB dataset (also computes W-MPJPE and WA-MPJPE)
python -m lib.eval.evaluate_emdb --cfg configs/yamls/demo.yaml --eval-split 1 TRAIN.CHECKPOINT checkpoints/wham_vit_w_3dpw.pth.tar # EMDB 1
python -m lib.eval.evaluate_emdb --cfg configs/yamls/demo.yaml --eval-split 2 TRAIN.CHECKPOINT checkpoints/wham_vit_w_3dpw.pth.tar # EMDB 2
Training
WHAM training involves into two different stages; (1) 2D to SMPL lifting through AMASS dataset and (2) finetuning with feature integration using the video datasets. Please see Dataset for preprocessing the training datasets.
Stage 1.
python train.py --cfg configs/yamls/stage1.yaml
Stage 2.
Training stage 2 requires pretrained results from the stage 1. You can use your pretrained results, or download the weight from Google Drive save as checkpoints/wham_stage1.tar.pth
.
python train.py --cfg configs/yamls/stage2.yaml TRAIN.CHECKPOINT <PATH-TO-STAGE1-RESULTS>
Train with BEDLAM
TBD
Acknowledgement
We would like to sincerely appreciate Hongwei Yi and Silvia Zuffi for the discussion and proofreading. Part of this work was done when Soyong Shin was an intern at the Max Planck Institute for Intelligence System.
The base implementation is largely borrowed from VIBE and TCMR. We use ViTPose for 2D keypoints detection and DPVO, DROID-SLAM for extracting camera motion. Please visit their official websites for more details.
TODO
-
Data preprocessing
-
Training implementation
-
Colab demo release
-
Demo for custom videos
Citation
@InProceedings{shin2023wham,
title={WHAM: Reconstructing World-grounded Humans with Accurate 3D Motion},
author={Shin, Soyong and Kim, Juyong and Halilaj, Eni and Black, Michael J.},
booktitle={Computer Vision and Pattern Recognition (CVPR)},
year={2024}
}
License
Please see License for details.
Contact
Please contact soyongs@andrew.cmu.edu for any questions related to this work.