Home

Awesome

<!-- * @Date: 2021-01-13 20:32:12 * @Author: Qing Shuai * @LastEditors: Qing Shuai * @LastEditTime: 2022-11-03 13:09:58 * @FilePath: /EasyMocapRelease/Readme.md --> <div align="center"> <img src="logo.png" width="40%"> </div>

EasyMocap is an open-source toolbox for markerless human motion capture and novel view synthesis from RGB videos. In this project, we provide a lot of motion capture demos in different settings.

python star

News


Core features

Multiple views of a single person

report Open In Colab

This is the basic code for fitting SMPL1/SMPL+H2/SMPL-X3/MANO2 model to capture body+hand+face poses from multiple views.

<div align="center"> <img src="doc/feng/mv1pmf-smplx.gif" width="80%"> <br> <img src="https://raw.githubusercontent.com/chingswy/Dataset-Demo/main/EasyMocap/mv1p-dance-smpl.gif" width="80%"> <br> <sup>Videos are from ZJU-MoCap, with 23 calibrated and synchronized cameras.</sup> </div> <div align="center"> <img src="doc/feng/mano.gif" width="80%"> <br> <sup>Captured with 8 cameras.</sup> </div>

Internet video

This part is the basic code for fitting SMPL1 with 2D keypoints estimation45 and CNN initialization6.

<div align="center"> <img src="https://raw.githubusercontent.com/chingswy/Dataset-Demo/main/EasyMocap/23EfsN7vEOA%2B003170%2B003670.gif" width="80%"> <br> <sup>The raw video is from <a href="https://www.youtube.com/watch?v=23EfsN7vEOA">Youtube</a>.</sup> </div>

Internet video with a mirror

report quickstart

<div align="center"> <img src="https://raw.githubusercontent.com/zju3dv/Mirrored-Human/main/doc/assets/smpl-avatar.gif" width="80%"> <br> <sup>The raw video is from <a href="https://www.youtube.com/watch?v=KOCJJ27hhIE">Youtube</a>.</sup> </div>

Multiple Internet videos with a specific action (Coming soon)

report quickstart

<div align="center"> <img src="doc/imocap/imocap.gif" width="80%"><br/> <sup>Internet videos of Roger Federer's serving</sup> </div>

Multiple views of multiple people

report quickstart

<div align="center"> <img src="doc/assets/mvmp1f.gif" width="80%"><br/> <sup>Captured with 8 consumer cameras</sup> </div>

Novel view synthesis from sparse views

report quickstart

<div align="center"> <img src="https://raw.githubusercontent.com/chingswy/Dataset-Demo/main/EasyMocap/female-ballet.gif" width="80%"><br/> <sup>Novel view synthesis for chanllenge motion(coming soon)</sup> </div> <div align="center"> <img src="https://raw.githubusercontent.com/chingswy/Dataset-Demo/main/EasyMocap/nvs_mp_soccer1_6_rgb.gif" width="80%"><br/> <sup>Novel view synthesis for human interaction</sup> </div>

ZJU-MoCap

With our proposed method, we release two large dataset of human motion: LightStage and Mirrored-Human. See the website for more details.

If you would like to download the ZJU-Mocap dataset, please sign the agreement, and email it to Qing Shuai (s_q@zju.edu.cn) and cc Xiaowei Zhou (xwzhou@zju.edu.cn) to request the download link.

<div align="center"> <div align="center" width="40%"> <img src="doc/assets/ZJU-MoCap-lightstage.jpg" width="40%"><br/> <sup>LightStage: captured with LightStage system</sup> </div> <div align="center" width="40%"> <img src="https://raw.githubusercontent.com/chingswy/Dataset-Demo/main/EasyMocap/mirrored-human.jpg" width="40%"><br/> <sup>Mirrored-Human: collected from the Internet</sup> </div> </div>

Many works have achieved wonderful results based on our dataset:

Other features

3D Realtime visualization

quickstart

<div align="center"> <img src="https://raw.githubusercontent.com/chingswy/Dataset-Demo/main/assets/vis3d/skel-body25.gif" width="26%"> <img src="https://raw.githubusercontent.com/chingswy/Dataset-Demo/main/assets/vis3d/skel-total.gif" width="26%"> <img src="https://raw.githubusercontent.com/chingswy/Dataset-Demo/main/assets/vis3d/skel-multi.gif" width="26%"> </div> <div align="center"> <img src="https://raw.githubusercontent.com/chingswy/Dataset-Demo/main/assets/vis3d/mesh-smpl.gif" width="26%"> <img src="https://raw.githubusercontent.com/chingswy/Dataset-Demo/main/assets/vis3d/mesh-smplx.gif" width="26%"> <img src="https://raw.githubusercontent.com/chingswy/Dataset-Demo/main/assets/vis3d/mesh-manol.gif" width="26%"> </div>

Camera calibration

<div align="center"> <img src="https://raw.githubusercontent.com/chingswy/Dataset-Demo/main/EasyMocap/calib_intri.jpg" width="40%"> <img src="https://raw.githubusercontent.com/chingswy/Dataset-Demo/main/EasyMocap/calib_extri.jpg" width="40%"> <br> <sup>Calibration for intrinsic and extrinsic parameters</sup> </div>

Annotator

<div align="center"> <img src="https://raw.githubusercontent.com/chingswy/Dataset-Demo/main/EasyMocap/annot_keypoints.jpg" width="40%"> <img src="https://raw.githubusercontent.com/chingswy/Dataset-Demo/main/EasyMocap/annot_mask.jpg" width="40%"> <br> <sup>Annotator for bounding box, keypoints and mask</sup> </div>

Updates

Installation

See documentation for more instructions.

Acknowledgements

Here are the great works this project is built upon:

Contact

Please open an issue if you have any questions. We appreciate all contributions to improve our project.

Contributor

EasyMocap is built by researchers from the 3D vision group of Zhejiang University: Qing Shuai, Qi Fang, Junting Dong, Sida Peng, Di Huang, Hujun Bao, and Xiaowei Zhou.

We would like to thank Wenduo Feng, Di Huang, Yuji Chen, Hao Xu, Qing Shuai, Qi Fang, Ting Xie, Junting Dong, Sida Peng and Xiaopeng Ji who are the performers in the sample data. We would also like to thank all the people who has helped EasyMocap in any way.

Citation

This project is a part of our work iMocap, Mirrored-Human, mvpose, Neural Body, MultiNeuralBody, enerf.

Please consider citing these works if you find this repo is useful for your projects.

@Misc{easymocap,  
    title = {EasyMoCap - Make human motion capture easier.},
    howpublished = {Github},  
    year = {2021},
    url = {https://github.com/zju3dv/EasyMocap}
}

@inproceedings{shuai2022multinb,
  title={Novel View Synthesis of Human Interactions from Sparse
Multi-view Videos},
  author={Shuai, Qing and Geng, Chen and Fang, Qi and Peng, Sida and Shen, Wenhao and Zhou, Xiaowei and Bao, Hujun},
  booktitle={SIGGRAPH Conference Proceedings},
  year={2022}
}

@inproceedings{lin2022efficient,
  title={Efficient Neural Radiance Fields for Interactive Free-viewpoint Video},
  author={Lin, Haotong and Peng, Sida and Xu, Zhen and Yan, Yunzhi and Shuai, Qing and Bao, Hujun and Zhou, Xiaowei},
  booktitle={SIGGRAPH Asia Conference Proceedings},
  year={2022}
}

@inproceedings{dong2021fast,
  title={Fast and Robust Multi-Person 3D Pose Estimation and Tracking from Multiple Views},
  author={Dong, Junting and Fang, Qi and Jiang, Wen and Yang, Yurou and Bao, Hujun and Zhou, Xiaowei},
  booktitle={T-PAMI},
  year={2021}
}
    
@inproceedings{dong2020motion,
  title={Motion capture from internet videos},
  author={Dong, Junting and Shuai, Qing and Zhang, Yuanqing and Liu, Xian and Zhou, Xiaowei and Bao, Hujun},
  booktitle={European Conference on Computer Vision},
  pages={210--227},
  year={2020},
  organization={Springer}
}

@inproceedings{peng2021neural,
  title={Neural Body: Implicit Neural Representations with Structured Latent Codes for Novel View Synthesis of Dynamic Humans},
  author={Peng, Sida and Zhang, Yuanqing and Xu, Yinghao and Wang, Qianqian and Shuai, Qing and Bao, Hujun and Zhou, Xiaowei},
  booktitle={CVPR},
  year={2021}
}

@inproceedings{fang2021mirrored,
  title={Reconstructing 3D Human Pose by Watching Humans in the Mirror},
  author={Fang, Qi and Shuai, Qing and Dong, Junting and Bao, Hujun and Zhou, Xiaowei},
  booktitle={CVPR},
  year={2021}
}

<!-- [4] Bogo, Federica, et al. "Keep it SMPL: Automatic estimation of 3D human pose and shape from a single image." European conference on computer vision. Springer, Cham, 2016. -->

Footnotes

  1. Loper, Matthew, et al. "SMPL: A skinned multi-person linear model." ACM transactions on graphics (TOG) 34.6 (2015): 1-16. 2

  2. Romero, Javier, Dimitrios Tzionas, and Michael J. Black. "Embodied hands: Modeling and capturing hands and bodies together." ACM Transactions on Graphics (ToG) 36.6 (2017): 1-17. 2

  3. Pavlakos, Georgios, et al. "Expressive body capture: 3d hands, face, and body from a single image." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019.

  4. Cao, Z., Hidalgo, G., Simon, T., Wei, S.E., Sheikh, Y.: Openpose: real-time multi-person 2d pose estimation using part affinity fields. arXiv preprint arXiv:1812.08008 (2018) 2

  5. Sun, Ke, et al. "Deep high-resolution representation learning for human pose estimation." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019.

  6. Kolotouros, Nikos, et al. "Learning to reconstruct 3D human pose and shape via model-fitting in the loop." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019 2

  7. Bochkovskiy, Alexey, Chien-Yao Wang, and Hong-Yuan Mark Liao. "Yolov4: Optimal speed and accuracy of object detection." arXiv preprint arXiv:2004.10934 (2020).