Home

Awesome

STCFormer: 3D Human Pose Estimation with Spatio-Temporal Criss-cross Attention [CVPR 2023]

This is the readme file for the code release of "3D Human Pose Estimation with Spatio-Temporal Criss-cross Attention" on PyTorch platform.

Thank you for your interest, the code and checkpoints are being updated.

3D Human Pose Estimation with Spatio-Temporal Criss-cross Attention,
Zhenhua Tang, Zhaofan Qiu, Yanbin Hao, Richang Hong, And Ting Yao,
In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023

Poster:

<p align="center"><img src="poster_9M.png" width="100%" alt="" /></p>

Demo:

Alt Text

The released codes include:

checkpoint/:                        the folder for model weights of STCFormer.
dataset/:                           the folder for data loader.
common/:                            the folder for basic functions.
model/:                             the folder for STCFormer network.
run_stc.py:                         the python code for STCFormer networks training.

Dependencies

Make sure you have the following dependencies installed:

Dataset

Our model is evaluated on Human3.6M and MPI-INF-3DHP datasets.

Human3.6M

We set up the Human3.6M dataset in the same way as VideoPose3D.

MPI-INF-3DHP

We set up the MPI-INF-3DHP dataset in the same way as P-STMO.

Training from scratch

Human 3.6M

For the training stage, please run:

python run_stc.py -f 27 -b 128  --train 1 --layers 6 -s 3

For the testing stage, please run:

python run_stc.py -f 27 -b 128  --train 0 --layers 6 -s 1 --reload 1 --previous_dir ./checkpoint/your_best_model.pth

Evaluating our models

You can download our pre-trained models from Google Drive or Baidu Disk (extraction code:STC1). Put them in the ./checkpoint directory.

Human 3.6M

To evaluate our STCFormer model on the 2D keypoints obtained by CPN, please run:

python run_stc.py -f 27 -b 128  --train 0 --layers 6 -s 1 -k 'cpn_ft_h36m_dbb' --reload 1 --previous_dir ./checkpoint/model_27_STCFormer/no_refine_6_4406.pth
python run_stc.py -f 81 -b 128  --train 0 --layers 6 -s 1 -k 'cpn_ft_h36m_dbb' --reload 1 --previous_dir ./checkpoint/model_81_STCFormer/no_refine_6_4172.pth

Different models use different configurations as follows.

FramesP1 (mm)P2 (mm)
2744.0834.76
8141.7232.94

Since the model with 243-frames input is proprietary and stored exclusively on the company server, it is unavailable due to copyright restrictions. If you require results based on that specific model, I recommend training a similar model internally to achieve the desired outcome.

MPI-INF-3DHP

The pre-trained models and codes for STCFormer are currently undergoing updates. In the meantime, you can run this code, which is based on an earlier version and may lack organization, to observe the results for 81 frames.

 python run_3dhp_stc.py --train 0 --frames 81  -b 128  -s 1  --reload 1 --previous_dir ./checkpoint/model_81_STMO/no_refine_8_2310.pth

In the Wild Video

Accroding MHFormer, make sure to download the YOLOv3 and HRNet pretrained models here and put it in the './demo/lib/checkpoint' directory firstly. Then, you need to put your in-the-wild videos in the './demo/video' directory.

You can modify the 'get_pose3D' function in the 'vis.py' script according to your needs, including the checkpoint and model parameters, and then execute the following command:

 python demo/vis.py --video sample_video.mp4

Citation

If you find this repo useful, please consider citing our paper:

@inproceedings{tang20233d,
title={3D Human Pose Estimation With Spatio-Temporal Criss-Cross Attention},
author={Tang, Zhenhua and Qiu, Zhaofan and Hao, Yanbin and Hong, Richang and Yao, Ting},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={4790--4799},
year={2023} }

Acknowledgement

Our code refers to the following repositories.

VideoPose3D
StridedTransformer-Pose3D
P-STMO
MHFormer
MixSTE
FTCM

We thank the authors for releasing their codes.