Home

Awesome

Multi-person Human Pose Estimation with HigherHRNet in PyTorch

This is an unofficial implementation of the paper HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation.
The code is a simplified version of the official code with the ease-of-use in mind.

The code is fully compatible with the official pre-trained weights. It supports both Windows and Linux.

This repository currently provides:

This repository is built along the lines of the repository simple-HRNet.
Unfortunately, compared to HRNet, results and performance of HigherHRNet are somewhat disappointing: the network and the required post-processing are slower and the predictions does not look more precise. Moreover, multiple skeletons are often predicted for the same person, requiring additional steps to filter out the redundant poses.
On the other hand, being a bottom-up approach, HigherHRNet does not rely on any person detection algorithm like Yolo-v3 and can be used for person detection too.

Examples

<table> <tr> <td align="center"><img src="./gifs/gif-01-output.gif" width="100%" height="auto" /></td> <td align="center"><img src="./gifs/gif-02-output.gif" width="100%" height="auto" /></td> </tr> </table>

Class usage

import cv2
from SimpleHigherHRNet import SimpleHigherHRNet

model = SimpleHigherHRNet(32, 17, "./weights/pose_higher_hrnet_w32_512.pth")
image = cv2.imread("image.png", cv2.IMREAD_COLOR)

joints = model.predict(image)

The most useful parameters of the __init__ function are:

<table> <tr> <td>c</td><td>number of channels (HRNet: 32, 48)</td> </tr> <tr> <td>nof_joints</td><td>number of joints (COCO: 17, CrowdPose: 14)</td> </tr> <tr> <td>checkpoint_path</td><td>path of the (official) weights to be loaded</td> </tr> <tr> <td>resolution</td><td>image resolution (min side), it depends on the loaded weights</td> </tr> <tr> <td>return_heatmaps</td><td>the `predict` method returns also the heatmaps</td> </tr> <tr> <td>return_bounding_boxes</td><td>the `predict` method returns also the bounding boxes</td> </tr> <tr> <td>filter_redundant_poses</td><td>redundant poses (poses being almost identical) are filtered out</td> </tr> <tr> <td>max_nof_people</td><td>maximum number of people in the scene</td> </tr> <tr> <td>max_batch_size</td><td>maximum batch size used in hrnet inference</td> </tr> <tr> <td>device</td><td>device (cpu or cuda)</td> </tr> </table>

Running the live demo

From a connected camera:

python scripts/live-demo.py --camera_id 0

From a saved video:

python scripts/live-demo.py --filename video.mp4

For help:

python scripts/live-demo.py --help

Extracting keypoints:

From a saved video:

python scripts/extract-keypoints.py --format csv --filename video.mp4

For help:

python scripts/extract-keypoints.py --help

Converting the model to TensorRT:

Warning: require the installation of TensorRT (see Nvidia website) and onnx. On some platforms, they can be installed with

pip install tensorrt onnx

Converting in FP16:

python scripts/export-tensorrt-model.py --device 0 --half

For help:

python scripts/export-tensorrt-model.py --help

Installation instructions

ToDos