Awesome
Keypoint Transformer: Solving Joint Identification in Challenging Hands and Object Interactions for Accurate 3D Pose Estimation
Overview
- This repo is the official Pytorch implementation of Keypoint Transformer: Solving Joint Identification in Challenging Hands and Object Interactions for Accurate 3D Pose Estimation accepted at CVPR'22 (Oral)
- The dataset contains Evaluation and Training scripts for the network introduced in the paper using different datasets.
Installation and Setup
-
Setup the conda environment
conda create --name kypt_trans python==3.8.11 conda activate kypt_trans conda install pytorch==1.10.1 torchvision==0.11.2 torchaudio==0.10.1 cudatoolkit=11.3 -c pytorch -c conda-forge pip install -r requirements.txt
-
Download MANO model files from the website (requires login) and set the
smplx_path
inconfig.py
-
Clone the Current Repo
git clone <curr_repo> cd kypt_trans cd main
The setup has been tested on NVIDIA 3090 GPU.
Depending on the dataset you intend to train/evaluate follow the instructions below for the setup.
InterHand2.6M Setup
- Download the dataset from the website
- In
config.py
, setinterhand_anno_dir
to point to the annotations directory - In
config.py
, setinterhand_images_path
to point to the images directory - If you intend to use RootNet output for the
root joint translation, download the RootNet results for InterHand2.6M from here.
Set
root_net_output_path
inconfig.py
for point to the RootNet outputs folder. Instead, if you intend to test with ground-truth relative translation, setroot_net_output_path
toNone
HO-3D Setup
- Download the dataset from the website
and set
ho3d_anno_dir
inconfig.py
to point to the dataset folder - Download the YCB models from here. The original mesh models are large and wont fit in the memory
for some computation. Because of this the mesh decimated to 2000 faces as described here. Rename the
decimated models as
textured_simple_2000.obj
H<sub>2</sub>O-3D Setup
- Download the dataset from the website
and set
h2o3d_anno_dir
inconfig.py
to point to the dataset folder. - Follow step 2 in HO-3D Setup above to download, decimate and set the object models.
Demo
We provide the demo script for visualizing the outputs.
InterHand2.6M
- Download the checkpoint file for the model trained to output MANO joint angles from here
- Set
dataset = 'InterHand2.6M'
andpose_representation = 'angles'
inconfig.py
- Run the following script:
python demo.py --ckpt_path <path_to_ckpt> --use_big_decoder --dec_layers 6
- The outputs are shown in matplotlib and open3d windows. See the instructions in the command line to navigate
HO-3D
- Download the checkpoint file for the model trained to output 3D pose representation from here
- Set
dataset = 'ho3d'
andpose_representation = '3D'
inconfig.py
- Run the follwing script:
python demo.py --ckpt_path <path_to_ckpt> --use_big_decoder --dec_layers 6
- Since the output here is only 3D joint locations, the projections in 2D are shown in the matplotlib window. See the instructions in the command line to navigate.
Evaluation
Depending on the dataset you intend to evaluate follow the instructions below.
InterHand2.6M (Table 1 in Paper)
-
Make the following changes in the
config.py
dataset = 'InterHand2.6M'` pose_representation = '2p5D' # Table 1 in paper uses 2.5D pose representation
-
Download the checkpoint file from here
-
Run the following command:
python test.py --ckpt_path <path_to_interhand2.6m_ckpt> --gpu_ids <gpu_ids>
If running on multiple GPUs, set
<gpu_ids>
to0,1,2,3
-
The error metrics are dumped into a .txt file in the folder containing the checkpoint
-
Final numbers as below:
Single Hand MPJPE (mm) | Interacting Hands MPJPE (mm) | All MPJPE (mm) | MRRPE (mm) |
---|---|---|---|
10.88 | 14.16 | 12.62 | 29.50 |
HO-3D (v2) (Table 2 in Paper)
-
Make the following changes in the
config.py
dataset = 'ho3d' pose_representation = '3D' # Table 2 in paper uses 3D pose representation
-
Download the checkpoint file from here
-
Run the following command:
python test.py --ckpt_path <path_to_ho3d_ckpt> --use_big_decoder --dec_layers 6
-
The object error metric (MSSD) is dumped into a .txt file in the folder containing the checkpoint
-
Also dumped is a .json file which can be submitted to the HO-3D (v2) challenge after zipping the file
-
Here is the dumped results file after the run: [Results txt file]
-
Hand pose estimation accuracy in the HO-3D challenge leaderboard: here, user: bullet
H<sub>2</sub>O-3D
-
Make the following changes in the
config.py
dataset = 'h2o3d' pose_representation = '3D' # h2o3d results in paper uses 3D pose representation
-
Download the checkpoint file from here
-
Run the following command:
python test.py --ckpt_path <path_to_h2o3d_ckpt>
-
The object error metric (MSSD) is dumped into a .txt file in the folder containing the checkpoint
-
Also dumped is a .json file which can be submitted to the H<sub>2</sub>O-3D challenge after zipping the file.
-
Here is the dumped results file after the run: [Results txt file]
-
Hand pose estimation accuracy in the H<sub>2</sub>O-3D challenge leaderboard: here, user: bullet
Training
- Depending on the dataset and output pose respresentation you intend to train on, set the
dataset
&pose_representation
variables in theconfig.py
. - Run the following script to start the training:
To continue training from the last saved checkpoint useCUDA_VISIBLE_DEVICES=0,1 python train.py --run_dir_name <run_name>
--continue
argument in the above command. - The checkpoints are dumped after every ecoch in the 'output' folder of the base directory
- Tensorboard logging is also available in the 'output' folder
Training with HO-3D and H<sub>2</sub>O-3D datasets together
The H<sub>2</sub>O-3D results in the paper are obtained by training the network on the combined dataset
of HO-3D and H<sub>2</sub>O-3D. This training can be achieved by setting dataset
to
ho3d_h2o3d
in config.py
.
Reference
@InProceedings{Hampali_2022_CVPR_Kypt_Trans,
author = {Shreyas Hampali and Sayan Deb Sarkar and Mahdi Rad and Vincent Lepetit},
title = {Keypoint Transformer: Solving Joint Identification in Challenging Hands and Object Interactions for Accurate 3D Pose Estimation},
booktitle = {IEEE Computer Vision and Pattern Recognition Conference},
year = {2022}
}
Acknowlegements
- A lot of the code has been reused from InterHand2.6M repo and the DETR repo. We thank the authors for making their code public