Home

Awesome

Enhancing Feature Fusion for Human Pose Estimation

A new method to fuse high-level features and low-level features in human pose estimation

Introduction

This code refers to SimpleBaseline: https://github.com/microsoft/human-pose-estimation.pytorch. we use Semantic Embedding Block (SEB) and Global Convolutional Network (GCN) blocks to bridge the gap between low-level and high-level features. Experiments on MPII and LSP human pose estimation datasets demonstrate that efficient feature fusion can significantly improve the performance.

Results on MPII val

MethodInputHeadShoulderElbowWristHipKneeAnkleMean
SimpleBaseline_ResNet50256x25696.3595.3388.9983.1888.4283.9679.5988.53
ours256x25696.7395.3589.5083.7388.2384.4379.9288.82
SimpleBaseline_ResNet50384x38496.6695.7589.7984.6188.5284.6779.2989.07
ours384x38496.6795.7590.0585.5888.8584.7379.7489.35

Environment

python >= 3.6
pytorch >= 1.0.0

Quick start

  1. Download the dataset and pretrained model, you can follow the an official pytorch implementation of SimpleBaseline.
  2. Training the model:
python pose_estimation/train.py \
    --cfg experiments/mpii/resnet50/256x256_d256x3_adam_lr1e-3.yaml
  1. valid the model:
python pose_estimation/valid.py \
    --cfg experiments/mpii/resnet50/256x256_d256x3_adam_lr1e-3.yaml \
    --flip-test \
    --model-file models/pytorch/pose_mpii/pose_resnet_50_256x256.pth.tar

Future work

look forward to multi-scale feature fusion structures.