Awesome

DiffPose: Multi-hypothesis Human Pose Estimation using Diffusion Models (ICCV2023)

Authors:

<a href="https://scholar.google.se/citations?user=Q2AY_q4AAAAJ&hl=sv&oi=ao">Karl Holmquist</a>

<a href="https://scholar.google.se/citations?user=z4aXEBYAAAAJ&hl=sv&oi=ao">Bastian Wandt</a>

<a href="https://openaccess.thecvf.com/content/ICCV2023/html/Holmquist_DiffPose_Multi-hypothesis_Human_Pose_Estimation_using_Diffusion_Models_ICCV_2023_paper.html">Paper</a>

Overview:

This repository contains the code and some pre-trained models for our diffusion-based multi-hypothesis 3D human pose estimation method.

Abstract:

Traditionally, monocular 3D human pose estimation employs a machine learning model to predict the most likely 3D pose for a given input image. However, a single image can be highly ambiguous and induces multiple plausible solutions for the 2D-3D lifting step, which results in overly confident 3D pose predictors. To this end, we propose DiffPose, a conditional diffusion model that predicts multiple hypotheses for a given input image. Compared to similar approaches, our diffusion model is straightforward and avoids intensive hyperparameter tuning, complex network structures, mode collapse, and unstable training.

Moreover, we tackle the problem of over-simplification of the intermediate representation of the common two-step approaches which first estimate a distribution of 2D joint locations via joint-wise heatmaps and consecutively use their maximum argument for the 3D pose estimation step. Since such a simplification of the heatmaps removes valid information about possibly correct, though labeled unlikely, joint locations, we propose to represent the heatmaps as a set of 2D joint candidate samples. To extract information about the original distribution from these samples, we introduce our embedding transformer which conditions the diffusion model.

Experimentally, we show that DiffPose improves upon the state of the art for multi-hypothesis pose estimation by 3-5% for simple poses and outperforms it by a large margin for highly ambiguous poses.

Paper:

Paper accepted for oral presentation at ICCV2023 in Paris and can be found here <a href="https://openaccess.thecvf.com/content/ICCV2023/html/Holmquist_DiffPose_Multi-hypothesis_Human_Pose_Estimation_using_Diffusion_Models_ICCV_2023_paper.html">DiffPose</a>

Affiliation:

Computer Vision Laboratories (CVL) at Linköping University, Sweden

Installation

We recommend creating a clean conda environment. You can do this as follows:

conda env create -f environment.yml

After the installation is complete, you can activate the conda environment by running:

conda activate DiffPose

Usage

Observer that some plotting functionalities can be limited without a wandb account, please use '--do_not_use_wandb' in this case.

Training

Our main experiments can be trained using:

python train.py --config diffpose.yaml --seed 42

For the other experiments their respective config files can be found at experiments/iccv2023. And the used random seeds in experiments/random_seeds.txt

Evaluation

To evaluate the code separately from training:

python eval.py --config diffpose.yaml

Demo

We provide demo functionalities in the demo folder for running inference of a trained model on a given image. Observe that the images are scaled to 255x255, to improve performance, make sure that most of the images consists of the person in question and not background. The 2D detector will also struggle if multiple persons are in the frame, leading to sub-optimal performance of our method.

Pre-trained 2D detector

This repository contains both the fine-tuned network weights used by <a href="https://github.com/twehrbein/Probabilistic-Monocular-3D-Human-Pose-Estimation-with-Normalizing-Flows">Wehrbein et.al.</a> and the non-finetuned weights it was based on from <a href="https://github.com/HRNet/HRNet-Human-Pose-Estimation">HRNet</a>.

The '--use_orig_hrnet' flag used when preprocessing the datasets, selects the non-finetuned weights when used.

Pre-trained model weights

The pre-trained 2D detector weights and the five models trained on H36M can be found on <a href="https://drive.google.com/drive/folders/1C182piD21B6fG5Aksl0ryT97zxQQdS0u?usp=share_link">Google Drive</a>

Trained Model Weights for DiffPose

These are the model weights for the 5 different seeds used for evaluating our method

2D Detector used	Random Seed for Diffpose	PA-MPJPE on H36M	PA-MPJPE on H36MA	Link to Model weight
Fine-tuned H36M	42	30.526	46.116	<a href="https://drive.google.com/drive/folders/1FCPUUOoOgiW3maZDHid5JIlIpdyLxsIf?usp=sharing">Seed 42</a>
Fine-tuned H36M	2967	30.618	46.661	<a href="https://drive.google.com/drive/folders/15_O1T0RLN78Gq_DZlBiNOtJD1cbn8Lop?usp=sharing">Seed 2967</a>
Fine-tuned H36M	6173	30.745	46.808	<a href="https://drive.google.com/drive/folders/1r4UJCKDo-eA-RSo2g5IMgfl_fsUE_c7L?usp=sharing">Seed 6173</a>
Fine-tuned H36M	5478	30.964	46.813	<a href="https://drive.google.com/drive/folders/1I2EA79iocnm6j4xVEgEmCLz0Y_0N6UEC?usp=sharing">Seed 5478</a>
Fine-tuned H36M	989	31.028	47.134	<a href="https://drive.google.com/drive/folders/1cH32KuRXfp51VesbrF6dx7Lr-gicwhQl?usp=sharing">Seed 989</a>

Model Weights for 2D joint detector

These are the model weights for the original model as well as the ones that have been fine-tuned on the 2D data from H36M.

Training Data	Link to Model weight
Oiginal weights (MPII w/o finetuning)	<a href="https://drive.google.com/file/d/1XUy8v-OtiGP9SGITalrNNARRgyzif6SW/view?usp=sharing">Original </a>
MPII w/ Fine-tuning on H36M (as previous methods)	<a href="https://drive.google.com/file/d/1b8ukMD0abRzIkMcGE_sFiBE8j7FJefHH/view?usp=sharing">Fined Tuned </a>

For generating the dataset, please download the weights for the 2D joint detector and place them in data/preprocessing/hrnet.

Datasets

Human3.6m

We provide tools for preprocessing the Human3.6M dataset, creating both the full split and the harder set of ambiguous samples proposed by <a href="https://github.com/twehrbein/Probabilistic-Monocular-3D-Human-Pose-Estimation-with-Normalizing-Flows">Wehrbein et.al.</a> in data/preprocessing/H36M.py.

Please note that due to licensing of the original dataset we cannot provide you with the data, neither can we help with getting access to it excepting for directing you towards the official website: <a href="http://vision.imar.ro/human3.6m/description.php">Human 3.6M</a>

MPI-INF-3DHP

Similarly, we provide preprocessing tools for <a href="https://vcai.mpi-inf.mpg.de/3dhp-dataset/">3DHP</a> in data/preprocessing/3DHP.py.

Acknowledgements:

Thanks to this <a href="https://github.com/lucidrains/denoising-diffusion-pytorch">great repo</a> which served as a starting point for the implementation of the diffusion model used in this work.