Awesome
DiffPose: Toward More Reliable 3D Pose Estimation, CVPR2023
<sup>1</sup>JIA GONG*, <sup>1</sup>Lin Geng Foo*, <sup>2</sup>Zhipeng Fan, <sup>3</sup>Qiuhong Ke, <sup>4</sup>Hossein Rahmani, <sup>1</sup>Jun Liu,
* equal contribution
<sup>1</sup>Singapore University of Technology and Design, <sup>2</sup>New York University, <sup>3</sup>Monash University, <sup>4</sup>Lancaster University
[Paper] | [Project Page] | [SUTD-VLG Lab]
DiffPose Model Architecture
<p align="center"> <img src="./figure/Diffpose_arch.png" width="100%"> </p>DiffPose Diffusion Process
<p align="center"> <img src="./figure/Diffpose_process.png" width="100%"> </p>Our code is built on top of DDIM.
Environment
The code is developed and tested under the following environment:
- Python 3.8.2
- PyTorch 1.7.1
- CUDA 11.0
You can create the environment via:
conda env create -f environment.yml
Dataset
Our datasets are based on 3d-pose-baseline and Video3D data. We provide the GMM format data generated from the above datasets here. You should put the downloaded files into the ./data
directory.
Note that we only change the format of the Video3D data to make them compatible with our GMM-based DiffPose training strategy, and the value of the 2D pose in our dataset is the same as them.
Frame-based experiments
Evaluating pre-trained models for frame-based experiments
We provide the pre-trained diffusion model (with CPN-dected 2D Pose as input) here. To evaluate it, put it into the ./checkpoint
directory and run:
CUDA_VISIBLE_DEVICES=0 python main_diffpose_frame.py \
--config human36m_diffpose_uvxyz_cpn.yml --batch_size 1024 \
--model_pose_path checkpoints/gcn_xyz_cpn.pth \
--model_diff_path checkpoints/diffpose_uvxyz_cpn.pth \
--doc t_human36m_diffpose_uvxyz_cpn --exp exp --ni \
>exp/t_human36m_diffpose_uvxyz_cpn.out 2>&1 &
We also provide the pre-trained diffusion model (with Ground truth 2D pose as input) here. To evaluate it, put it into the ./checkpoint
directory and run:
CUDA_VISIBLE_DEVICES=0 python main_diffpose_frame.py \
--config human36m_diffpose_uvxyz_gt.yml --batch_size 1024 \
--model_pose_path checkpoints/gcn_xyz_gt.pth \
--model_diff_path checkpoints/diffpose_uvxyz_gt.pth \
--doc t_human36m_diffpose_uvxyz_gt --exp exp --ni \
>exp/t_human36m_diffpose_uvxyz_gt.out 2>&1 &
Training new models
- To train a model from scratch (CPN 2D pose as input), run:
CUDA_VISIBLE_DEVICES=0 python main_diffpose_frame.py --train \
--config human36m_diffpose_uvxyz_cpn.yml --batch_size 1024 \
--model_pose_path checkpoints/gcn_xyz_cpn.pth \
--doc human36m_diffpose_uvxyz_cpn --exp exp --ni \
>exp/human36m_diffpose_uvxyz_cpn.out 2>&1 &
- To train a model from scratch (Ground truth 2D pose as input), run:
CUDA_VISIBLE_DEVICES=0 python main_diffpose_frame.py --train \
--config human36m_diffpose_uvxyz_gt.yml --batch_size 1024 \
--model_pose_path checkpoints/gcn_xyz_gt.pth \
--doc human36m_diffpose_uvxyz_gt --exp exp --ni \
>exp/human36m_diffpose_uvxyz_gt.out 2>&1 &
Video-based experiments
Refer to https://github.com/GONGJIA0208/Diffpose_video
Bibtex
If you find our work useful in your research, please consider citing:
@InProceedings{gong2023diffpose,
author = {Gong, Jia and Foo, Lin Geng and Fan, Zhipeng and Ke, Qiuhong and Rahmani, Hossein and Liu, Jun},
title = {DiffPose: Toward More Reliable 3D Pose Estimation},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2023},
}
Acknowledgement
Part of our code is borrowed from DDIM, VideoPose3D, Graformer, MixSTE and PoseFormer. We thank the authors for releasing the codes.