Home

Awesome

Deep Learning for 3D Human Pose Estimation and Mesh Recovery: A Survey

Authors: Yang Liu, Changzhen Qiu, Zhiyong Zhang*

School of Electronics and Communication Engineering, Sun Yat-sen University, Shenzhen, Guangdong, China

Overview

This is the regularly updated project page of Deep Learning for 3D Human Pose Estimation and Mesh Recovery: A Survey, a review that primarily concentrates on deep learning approaches to 3D human pose estimation and human mesh recovery. This survey comprehensively includes the most recent state-of-the-art publications (2019-now) from mainstream computer vision conferences and journals.

Please create issues if you have any suggestions!

Citation

Please kindly cite the papers if our work is useful and helpful for your research.

@article{liu2024deep,
      title={Deep learning for 3D human pose estimation and mesh recovery: A survey}, 
      author={Liu, Yang and Qiu, Changzhen and Zhang, Zhiyong},
      journal={Neurocomputing},
      pages={128049},
      year={2024},
      issn={0925-2312},
      doi={https://doi.org/10.1016/j.neucom.2024.128049},
      publisher={Elsevier}
}

3D human pose estimation

Human Mesh Recovery

The overview of the mainstream datasets.

DatasetTypeDataTotal framesFeatureDownload link
Human3.6M3D/MeshVideo3.6Mmulti-viewWebsite
3DPW3D/MeshVideo51Kmulti-personWebsite
MPI-INF-3DHP2D/3DVideo2Kin-wildWebsite
HumanEva3DVideo40Kmulti-viewWebsite
CMU-Panoptic3DVideo1.5Mmulti-view/multi-personWebsite
MuCo-3DHP3DImage8Kmulti-person/occluded sceneWebsite
SURREAL2D/3D/MeshVideo6.0Msynthetic modelWebsite
3DOH50K2D/3D/MeshImage51Kobject-occludedWebsite
3DCPMeshMesh190contactWebsite
AMASSMeshMotion11Ksoft-tissue dynamicsWebsite
DensePoseMeshImage50Kmulti-personWebsite
UP-3D3D/MeshImage8Ksport sceneWebsite
THuman2.0MeshImage7Ktextured surfaceWebsite

Comparisons of 3D pose estimation methods on Human3.6M.

MethodYearPublicationHighlightMPJPE↓PMPJPE↓Code
Graformer2022CVPR'22graph-based transformer35.2-Code
GLA-GCN2023ICCV'23adaptive GCN34.437.8Code
PoseDA2023arXiv'23domain adaptation49.434.2Code
GFPose2023CVPR'23gradient fields35.630.5Code
TP-LSTMs2022TPAMI'22pose similarity metric40.531.8-
FTCM2023TCSVT'23frequency-temporal collaborative28.1-Code
VideoPose3D2019CVPR'19semi-supervised46.836.5Code
PoseFormer2021ICCV'21spatio-temporal transformer44.334.6Code
STCFormer2023CVPR'23spatio-temporal transformer40.531.8Code
3Dpose_ssl2020TPAMI'20self-supervised63.663.7Code
MTF-Transformer2022TPAMI'22multi-view temporal fusion26.2-Code
AdaptPose2022CVPR'22cross datasets42.534.0Code
3D-HPE-PAA2022TIP'22part aware attention43.133.7Code
DeciWatch2022ECCV'22efficient framework52.8-Code
Diffpose2023CVPR'23pose refine36.928.7Code
Elepose2022CVPR'22unsupervised-36.7Code
Uplift and Upsample2023CVPR'23efficient transformers48.137.6Code
RS-Net2023TIP'23regular splitting graph network48.638.9Code
HSTFormer2023arXiv'23spatial-temporal transformers42.733.7Code
PoseFormerV22023CVPR'23frequency domain45.235.6Code
DiffPose2023ICCV'23diffusion models42.930.8Code

Comparisons of 3D pose estimation methods on MPI-INF-3DHP.

MethodYearPublicationHighlightMPJPE↓PCK↑AUC↑Code
HSTFormer2023arXiv'23spatial-temporal transformers28.398.078.6Code
PoseFormerV22023CVPR'23frequency domain27.897.978.8Code
Uplift and Upsample2023CVPR'23efficient transformers46.995.467.6Code
RS-Net2023TIP'23regular splitting graph network-85.653.2Code
Diffpose2023CVPR'23pose refine29.198.075.9Code
FTCM2023TCSVT'23frequency-temporal collaborative31.297.979.8Code
STCFormer2023CVPR'23spatio-temporal transformer23.198.783.9Code
PoseDA2023arXiv'23domain adaptation61.392.062.5Code
TP-LSTMs2022TPAMI'22pose similarity metric48.882.681.3-
AdaptPose2022CVPR'22cross datasets77.288.454.2Code
3D-HPE-PAA2022TIP'22part aware attention69.490.357.8Code
Elepose2022CVPR'22unsupervised54.086.050.1Code

Comparisons of human mesh recovery methods on Human3.6M and 3DPW.

MethodPublicationHighlightHuman3.6M MPJPE↓Human3.6M PA-MPJPE↓3DPW MPJPE↓3DPW PA-MPJPE↓3DPW PVE↓Code
VirtualMarkerCVPR'23novel intermediate representation47.332.067.541.377.9Code
NIKICVPR'23inverse kinematics--71.340.686.6Code
TOREICCV'23efficient transformer59.636.472.344.488.2Code
JOTRICCV'23contrastive learning--76.448.792.6Code
HMDiffICCV'23reverse diffusion processing49.332.472.744.582.4Code
ReFitICCV'23recurrent fitting network48.432.265.841.0-Code
PyMAF-XTPAMI'23regression-based one-stage whole body--74.245.387.0Code
PointHMRCVPR'23vertex-relevant feature extraction48.332.973.944.985.5-
PLIKSCVPR'23inverse kinematics47.034.560.538.573.3Code
ProPoseCVPR'23learning analytical posterior probability45.729.168.340.679.4Code
POTTERCVPR'23pooling attention transformer56.535.175.044.887.4Code
PoseExaminerICCV'23automated testing of out-of-distribution--74.546.588.6Code
MotionBERTICCV'23pretrained human representations43.127.868.840.679.4Code
3DNBFICCV'23analysis-by-synthesis approach--88.853.3-Code
FastMETROECCV'22efficient architecture52.233.773.544.684.1Code
CLIFFECCV'22multi-modality inputs47.132.769.043.081.2Code
PAREICCV'21part-driven attention--74.546.588.6Code
GraphormerICCV'21GCNN-reinforced transformer51.234.574.745.687.7Code
PSVTCVPR'23spatio-temporal encoder--73.143.584.0-
GLoTCVPR'23short-term and long-term temporal correlations67.046.380.750.696.3Code
MPS-NetCVPR'23temporally adjacent representations69.447.491.654.0109.6Code
MAEDICCV'21multi-level attention56.438.779.145.792.6Code
Lee et al.ICCV'21uncertainty-aware58.438.492.852.2106.1-
TCMRCVPR'21temporal consistency62.341.195.055.8111.3-
VIBECVPR'20self-attention temporal network65.641.482.951.999.1Code
ImpHMRCVPR'23implicitly imagine person in 3D space--74.345.487.1-
SGREICCV'23sequentially global rotation estimation--78.449.693.3Code
PMCEICCV'23pose and mesh co-evolution network53.537.769.546.784.8Code