

Simple Baselines for Human Pose Estimation and Tracking

This repo is TensorFlow implementation of Simple Baselines for Human Pose Estimation and Tracking (ECCV 2018) of MSRA for 2D multi-person pose estimation from a single RGB image.

What this repo provides:


This code is tested under Ubuntu 16.04, CUDA 9.0, cuDNN 7.1 environment with two NVIDIA 1080Ti GPUs.

Python 3.6.5 version with Anaconda 3 is used for development.



The ${POSE_ROOT} is described as below.

|-- data
|-- lib
|-- main
|-- tool
`-- output


You need to follow directory structure of the data as below.

|-- data
|-- |-- MPII
|   `-- |-- dets
|       |   |-- human_detection.json
|       |-- annotations
|       |   |-- train.json
|       |   `-- test.json
|       `-- images
|           |-- 000001163.jpg
|           |-- 000003072.jpg
|-- |-- PoseTrack
|   `-- |-- dets
|       |   |-- human_detection.json
|       |-- annotations
|       |   |-- train2018.json
|       |   |-- val2018.json
|       |   `-- test2018.json
|       |-- original_annotations
|       |   |-- train/
|       |   |-- val/
|       |   `-- test/
|       `-- images
|           |-- train/
|           |-- val/
|           `-- test/
|-- |-- COCO
|   `-- |-- dets
|       |   |-- human_detection.json
|       |-- annotations
|       |   |-- person_keypoints_train2017.json
|       |   |-- person_keypoints_val2017.json
|       |   `-- image_info_test-dev2017.json
|       `-- images
|           |-- train2017/
|           |-- val2017/
|           `-- test2017/
`-- |-- imagenet_weights
|       |-- resnet_v1_50.ckpt
|       |-- resnet_v1_101.ckpt
|       `-- resnet_v1_152.ckpt


You need to follow the directory structure of the output folder as below.

|-- output
|-- |-- log
|-- |-- model_dump
|-- |-- result
`-- |-- vis

Running TF-SimpleHumanPose



In the main folder, run

python train.py --gpu 0-1

to train the network on the GPU 0,1.

If you want to continue experiment, run

python train.py --gpu 0-1 --continue

--gpu 0,1 can be used instead of --gpu 0-1.


Place trained model at the output/model_dump/$DATASET/ and human detection result (human_detection.json) to data/$DATASET/dets/.

In the main folder, run

python test.py --gpu 0-1 --test_epoch 140

to test the network on the GPU 0,1 with 140th epoch trained model. --gpu 0,1 can be used instead of --gpu 0-1.


Here I report the performance of the model from this repo and the original paper. Also, I provide pre-trained models and human detection results.

As this repo outputs compatible output files for MS COCO and PoseTrack, you can directly use cocoapi or poseval to evaluate result on the MS COCO or PoseTrack dataset. You have to convert the produced mat file to MPII mat format to evaluate on MPII dataset following this.

Results on MSCOCO 2017 dataset

For all methods, the same human detection results are used (download link is provided at below). For comparison, I used pre-trained model from original repo to report the performance of the original repo. The table below is APs on COCO val2017 set.

MethodsAPAP .5AP .75AP (M)AP (L)ARAR .5AR .75AR (M)AR (L)Download
256x192_resnet50<br>(this repo)70.488.677.867.076.976.<br>pose
256x192_resnet50<br>(original repo)70.388.877.867.076.776.193.082.971.882.3-

Results on PoseTrack 2018 dataset

The pre-trained model on COCO dataset is used for training on the PoseTrack dataset following paper. After training model on the COCO dataset, I set lr, lr_dec_epoch, end_epoch in config.py to 5e-5, [150, 155], 160, respectively. Then, run python train.py --gpu $GPUS --continue. The table below is APs on validation set.

256x192_resnet50<br>(bbox from detector)74.476.972.<br>pose
256x192_resnet50<br>(bbox from GT)87.986.780.272.577.077.874.680.1model<br>pose


  1. Add graph.finalize when your machine takes more memory as training goes on. [issue]

  2. For those who suffer from FileNotFoundError: [Errno 2] No such file or directory: 'tmp_result_0.pkl' in testing stage, please prepare human detection result properly. The pkl files are generated and deleted automatically in testing stage, so you don't have to prepare them. Most of this error comes from inproper human detection file.


This repo is largely modified from TensorFlow repo of CPN and PyTorch repo of Simple.


[1] Xiao, Bin, Haiping Wu, and Yichen Wei. "Simple Baselines for Human Pose Estimation and Tracking". ECCV 2018.