Home

Awesome

SimplePose

Code and pre-trained models for our paper, “Simple Pose: Rethinking and Improving a Bottom-up Approach for Multi-Person Pose Estimation”, accepted by AAAI-2020.

Also this repo serves as the Part B of our paper "Multi-Person Pose Estimation using Body Parts" (under review). The Part A is available at this link.

Introduction

A bottom-up approach for the problem of multi-person pose estimation.

heatmap

network

network

optimization

skeleton

Changed log

Evaluation results

ChangesInput sizeC++MSFlipAPAP(M)AP(L)ARAR(M)AR(L)fps
original512v65.859.075.869.961.282.32.2 fps
refactored512v65.859.075.969.961.282.53.3 fps
refactored + score calc512v66.159.876.269.961.282.6
refactored + score calc512vv65.859.675.469.861.082.17.3 fps

Contents

  1. Training
  2. Evaluation
  3. Demo

Project Features

Prepare

  1. Install packages:

    Python=3.6, Pytorch>1.0, Nvidia Apex and other packages needed.

  2. Download the COCO dataset.

  3. Download the pre-trained models (default configuration: download the pretrained model snapshotted at epoch 52 provided as follow).

    Download Link: BaiduCloud

    Alternatively, download the pre-trained model without optimizer checkpoint only for the default configuration via: GoogleDrive

  4. Compile Cpp files

    • cd utils/pafprocess
    • sh make.sh
  5. Change the paths in the code according to your environment.

Run a Demo

python demo_image.py

examples

Inference Speed

The speed of our system is tested on the MS-COCO test-dev dataset.

Evaluation Steps

The corresponding code is in pure python without multiprocess for now.

python evaluate.py

Refactored Python

Results on MSCOCO 2017 minival skeletons with refactored Python (focal L2 loss with gamma=2):

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets= 20 ] = 0.661
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets= 20 ] = 0.859
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets= 20 ] = 0.716
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.598
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.762
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 20 ] = 0.699
 Average Recall     (AR) @[ IoU=0.50      | area=   all | maxDets= 20 ] = 0.873
 Average Recall     (AR) @[ IoU=0.75      | area=   all | maxDets= 20 ] = 0.742
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.612
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.825

Refactored Python + Cpp

Results on MSCOCO 2017 minival skeletons (focal L2 loss with gamma=2):


 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets= 20 ] = 0.658
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets= 20 ] = 0.856
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets= 20 ] = 0.713
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.596
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.754
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 20 ] = 0.698
 Average Recall     (AR) @[ IoU=0.50      | area=   all | maxDets= 20 ] = 0.872
 Average Recall     (AR) @[ IoU=0.75      | area=   all | maxDets= 20 ] = 0.740
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.610
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.824

Official

Results on MSCOCO 2017 test-dev skeletons (focal L2 loss with gamma=2):

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets= 20 ] = 0.685
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets= 20 ] = 0.867
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets= 20 ] = 0.749
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.664
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.719
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 20 ] = 0.728
 Average Recall     (AR) @[ IoU=0.50      | area=   all | maxDets= 20 ] = 0.892
 Average Recall     (AR) @[ IoU=0.75      | area=   all | maxDets= 20 ] = 0.782
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.688
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.784

Training Steps

Before training, prepare the training data using ''SimplePose/data/coco_masks_hdf5.py''.

Multiple GUPs are recommended to use to speed up the training process, but we support different training options.

python -m torch.distributed.launch --nproc_per_node=4 train_distributed.py

Note: The loss_model_parrel.py is for train.py and train_parallel.py, while the loss_model.py is for train_distributed.py and train_distributed_SWA.py. They are different in dividing the batch size. Please refer to the code about the different choices.

For distributed training, the real batch_size = batch_size_in_config* × GPU_Num (world_size actually). For others, the real batch_size = batch_size_in_config*. The differences come form the different mechanisms of data parallel training and distrubited training.

Referred Repositories (mainly)

Citation

Please kindly cite this paper in your publications if it helps your research.

@inproceedings{li2019simple,
	title={Simple Pose: Rethinking and Improving a Bottom-up Approach for Multi-Person Pose Estimation},
	author={Jia Li and Wen Su and Zengfu Wang},
	booktitle = {arXiv preprint arXiv:1911.10529},
	year={2019}
}