Home

Awesome

Fast Human Pose Estimation CVPR2019

Introduction

This is an official pytorch implementation of Fast Human Pose Estimation.

In this work, we focus on the two problems

  1. How to reduce the model size and computation using a model-agnostic method.
  2. How to improve the performance of the reduced model.

In our paper

  1. We reduce the model size and computation through reducing the width and depth of a network.
  2. Propose the fast pose distillation (FPD) to improve the performance of the reduced model.

The results on the MPII dataset demonstrate the effectiveness of our approach. We re-implemented the FPD using the HRNet codebase and provided extra evaluation on the COCO dataset. Our method (FPD) can work without ground-truth labels, it can utilize unlabeled images. Illustrating the architecture of the proposed HRNet

For the MPII dataset

  1. We first trained a teacher model (hourglass model, stacks=8, num_features=256, 90.520@MPII PCKh@0.5) and a student model (hourglass model, stacks=4, num_features=128, 89.040@MPII PCKh@0.5).
  2. We then used the teacher model's prediction and the ground-truth label to co-supervisie the student model (hourglass model, stacks=4, num_features=128, 87.934@MPII PCKh@0.5).
  3. Our experiment shows 1.106% gain from FPD.

For the COCO dataset

  1. We first trained a teacher model (HRNet-W48, input size=256x192, 75.0@COCO-Valid-Set AP) and a student model (HRNet-W32, input size=256x192, 74.4@COCO-Valid-Set AP).
  2. We then used the teacher model's prediction and the ground-truth label to co-supervisie the student model (HRNet-W32, input size=256x192, 75.1@COCO-Valid-Set AP).
  3. Our experiment shows 0.7% gain from FPD.

If you want to further improve the performance of the student model.You can remove the supervision of ground-truth label in the FPD when there are unlabeled images.

Main Results

Results on MPII val

ArchHeadShoulderElbowWristHipKneeAnkleMeanMean@0.1
hourglass_teacher97.16996.38290.83086.46690.01286.80282.66490.52038.275
hourglass_student96.82895.19487.72882.91987.90082.55178.27087.93434.634
hourglass_student_FPD*96.38594.90587.84781.87587.22581.90678.95587.59834.359
hourglass_student_FPD96.93095.55089.04084.44488.93984.02180.70389.04036.144

Note:

Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset

ArchInput size#ParamsGFLOPsAPAp .5AP .75AP (M)AP (L)ARAR .5AR .75AR (M)AR (L)
pose_hrnet_w48_teacher256x19263.6M14.60.7500.9060.8240.7130.8190.8030.9410.8670.7600.866
pose_hrnet_w32_student256x19228.5M7.10.7440.9050.8190.7080.8100.7980.9420.8650.7570.858
pose_hrnet_w32_student_FPD256x19228.5M7.10.7510.9060.8230.7140.8200.8040.9430.8690.7620.865

Note:

Development environment

The code is developed using python 3.5 on Ubuntu 16.04. NVIDIA GPUs are needed. The code is developed and tested using 4 TITAN XP GPU cards. Other platforms or GPU cards are not fully tested.

Quick start

1. Preparation

1.1 Prepare the dataset

For the MPII dataset, the original annotation files are in matlab format. We have converted them into json format, you also need to download them from OneDrive or GoogleDrive. Extract them under {POSE_ROOT}/data, your directory tree should look like this:

${POSE_ROOT}/data/mpii
├── images
└── mpii_human_pose_v1_u12_1.mat
|—— annot
|   |—— gt_valid.mat
└── |—— test.json
    |   |—— train.json
    |   |—— trainval.json
    |   |—— valid.json
    └── images
        |—— 000001163.jpg
        |—— 000003072.jpg

For the COCO dataset, your directory tree should look like this:

${POSE_ROOT}/data/coco
├── annotations
├── images
│   ├── test2017
│   ├── train2017
│   └── val2017
└── person_detection_results

1.2 Prepare the pretrained models

Your directory tree should look like this:

$HOME/models
├── pytorch
│   ├── imagenet
│   │   ├── hrnet_w32-36af842e.pth
│   │   ├── hrnet_w48-8ef0771d.pth
│   │   └── resnet50-19c8e357.pth
│   ├── pose_coco
│   │   ├── pose_hrnet_w32_256x192.pth
│   │   └── pose_hrnet_w48_256x192.pth
│   └── pose_mpii
│       ├── bs4_hourglass_128_4_1_16_0.00025_0_140_87.934_model_best.pth
│       ├── bs4_hourglass_256_8_1_16_0.00025_0_140_90.520_model_best.pth
│       ├── pose_hrnet_w32_256x256.pth
│       └── pose_hrnet_w48_256x256.pth
└── student_FPD
    ├── hourglass_student_FPD*.pth
    ├── hourglass_student_FPD.pth
    └── pose_hrnet_w32_student_FPD.pth

1.3 Prepare the environment

Setting the parameters in the file prepare_env.sh as follows:

# DATASET_ROOT=$HOME/datasets
# COCO_ROOT=${DATASET_ROOT}/MSCOCO
# MPII_ROOT=${DATASET_ROOT}/MPII
# MODELS_ROOT=${DATASET_ROOT}/models

Then execute:

bash prepare_env.sh

If you like, you can prepare the environment step by step

2. How to train the model

2.1 Download the pretrained models and place them like the section 1.2

For MPII dataset: [GoogleDrive] [BaiduDrive]

  1. hourglass student model

  2. hourglass teacher model

For COCO dataset: [GoogleDrive] [BaiduDrive]

  1. HRNet-W32 student model

  2. HRNet-W48 teacher model

2.2 Start training

# COCO dataset training
cd scripts/fpd_coco
bash run_train_hrnet.sh

# MPII dataset training
cd scripts/fpd_mpii
bash run_train_hrnet.sh # using hrnet model
bash run_train_hg.sh # using hourglass model

# General training methods, we also provide script shell
cd scripts/mpii
bash run_train_hrnet.sh # using hrnet model
bash run_train_hg.sh # using hourglass model
bash run_train_resnet.sh # using resnet model
cd scripts/coco
bash run_train_hrnet.sh # using hrnet model
bash run_train_hg.sh # using hourglass model
bash run_train_resnet.sh # using resnet model

3. How to test the model

3.1 Download the trained student models and place them like section 1.2

[GoogleDrive] [BaiduDrive]

For MPII dataset:

hourglass student FPD model

For COCO dataset:

HRNet-W32 student FPD model

3.2 FPD training results and logs

[GoogleDrive] [BaiduDrive]

Note:

Citation

If you use our code or models in your research, please cite with:

@InProceedings{Zhang_2019_CVPR,
author = {Zhang, Feng and Zhu, Xiatian and Ye, Mao},
title = {Fast Human Pose Estimation},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2019}
}

Discussion forum

ILovePose

Unoffical implementations

Fast_Human_Pose_Estimation_Pytorch

Acknowledgement

Thanks for the open-source HRNet