Home

Awesome

HOnnotate: A method for 3D Annotation of Hand and Object Poses

Shreyas Hampali, Mahdi Rad, Markus Oberweger, Vincent Lepetit, CVPR 2020

This repository contains code for annotating 3D poses of hand and object when captured with a single RGBD camera setup.

Citation

If this code base was helpful in your research work, please consider citing us:

@INPROCEEDINGS{hampali2020honnotate,
title={HOnnotate: A method for 3D Annotation of Hand and Object Poses},
author={Shreyas Hampali and Mahdi Rad and Markus Oberweger and Vincent Lepetit},
booktitle = {CVPR},
year = {2020}
}

Installation

        pip install numpy matplotlib scikit-image transforms3d tqdm opencv-python cython open3d

Setup

HOnnotate_ROOT is the directory where you download this repo.

            git clone https://github.com/tensorflow/models.git
            git checkout 834902277b8d9d38ef9982180aadcdaa0e5d24d3
        cd ./optimization
        python setup_mano.py ${MANO_PATH}
        cd ../
            - checkpoints
                - CPM_Hand
                - Deeplab_seg
            - eval
            - HOdatasets
            - models
                - CPM
                - deeplab
                - slim
            - objCorners
                - 003_cracker_box
                - 004_sugar_box
                ....
            - onlineAug
            - optimization
            - utils

Data Capture

        -test
            -rgb
                -0
                    -00001.png
                    -00002.png
                    .
                    .
            -depth
                -0
                        -00001.png
                        -00002.png
                        .
                        .
            -calibration
                -cam_0_depth_scale.txt
                -cam_0_instrinsics.txt
            -configs
                -configHandPose.json
                -configObjPose.json

cam_0_depth_scale.txt contains depth scale of the depth camera and cam_0_instrinsics.txt contains camera instrinsics. *.json in configs are used as inputs to the scripts (explained later). Folder name '0' in rgb and depth folder correspond to camera ID, which is always 0 in single camera setup.

Run (Single camera setup)

Please refer to Section 4.2 in the paper. Below stages for performing automatic hand-object pose annotation follows the same pipeline as in paper. Single camera pipeline

0. Keypoints and Segmentations

We use deeplab network for segmentation and Convolutional pose machine for hand keypoint detection. The networks are trained with HO3D dataset and the weights can be downloaded from here

0.1. Hand+Object segmentations

        python inference_seg.py --seq 'test'

The segmentations are saved in segmentation directory of the test sequence

<p align="center"> <img width="320" height="240" src="teaser_images/00002_image.png"> <img width="320" height="240" src="teaser_images/00002_prediction.png"> </p>

0.2. Hand 2D keypoints

This requires the segmentation script to be run beforehand

        python inference_hand.py --seq 'test'

The 2D keypoints are saved in CPMHand directory of the test sequence

<p align="center"> <img width="320" height="240" src="teaser_images/00002_handKps.jpg"> </p>

1. Hand and Object Pose Initializations

1.1. Object pose initialization

The object pose in all frames of the sequence is initialized by tracking. To reduce the effort of manual initialization, the object pose in the first frame can be simple upright position. Before tracking the object pose, the config file configObjPose.json in configs folder of thetest sequence should be updated.

The following script starts object tracking from the first frame of the sequence.

        cd ./optimization
        python objectTrackingSingleFrame.py --seq 'test' --doPyRender

Remove --doPyRender flag to run the script faster. It only helps with the visualization. The script creates dirt_obj_pose folder in the test sequence folder with the results of optimization for each frame and below visualization.

<p align="center"> <img width="400" src="teaser_images/obj_pose_init.png"> </p> The above figure shows the input frame after object segmentation, object rendered in the initialization pose, depth map error and silhouette error.

1.2. Hand pose initialization

This script obtains initial 3D grasp pose of the hand relative to the object coordinate frame using the hand 2D keypoints detected earlier (step 0.2). Refer to Eq. (12) in paper for this optimization for more details.

        python handPoseMultiframeInit.py --seq 'test'

The optimization uses chumpy package and is hence slow. The results are stored in handInit folder of test sequence.

<p align="center"> <img width="320" height="240" src="teaser_images/hand_init_kps.png"> <img width="320" height="240" src="teaser_images/hand_init_mesh.png"> </p> The 2D keypoints are lifted to 3D keypoints and the resulting mesh is shown in the above figure.

2. Grasp Pose Estimation

A more accurate grasp pose of the hand is obtained using the initialization in step 1.2. Refer to Eq. (13) in paper for more details. Modify the config file configHandPose.json in configs folder of test sequence as in step 1.1. Update betaFileName field in the json file to use different hand shape parameters or point to the correct beta files. Beta parameters of 10 different subjects used in the dataset generation can be downloaded from here

        python handPoseMultiframe.py --seq 'test' --numIter 200 --showFig --doPyRender

Remove --showFig and --doPyRender flags to run faster without visualization. The results of optimization and visualization (if enabled) will be dumped in dirt_grasp_pose folder of the test sequence.

<p align="center"> <img width="720" src="teaser_images/hand_pose_opt.gif"> <img width="450" src="teaser_images/grasp_pose.gif"> </p> The first figure above shows the pose of object and hand during optimization. First row is the input image, second row is the hand-object rendered with poses at current iteration, third and fourth row shows the depth and silhoutte erro. The second figure above is the grasp pose of the hand after optimization.

3. Object Pose Estimation

A more accurate object pose is obtained by tracking the object poses as explained in Section 4.2 of paper. The difference between this stage and Object pose initialization in step 1.1 is, the hand mesh rendered with the estimated grasp pose is also used in the optimization. Update the configHandObjPose.json file in configs folder of the test sequence as earlier in step 1.1.

       python handObjectTrackingSingleFrame.py --seq 'test' --showFig --doPyRender

The results are dumped in dirt_hand_obj_pose folder of test sequence.

4. Multi-frame Pose Refinement

This stage performs optimization over mutliple frames and over all the hand-object pose variables. Refer to Eq. (1) in paper. The optimization is done in batches.

        python handObjectRefinementMultiframe.py --seq 'test' --showFig --doPyRender --batchSize 20

The results are dumped in dirt_hand_obj_refine folder of test sequence.

<p align="center"> <img width="320" src="teaser_images/ho_refine.gif"> <img width="320" src="teaser_images/ho_refine_blend.gif"> </p>

Known issues

The segmentation network often under-segmentents the hand near the finger tips. This results in a small shift in the final annotated keypoints of the finger tips. In order to account for this, the segmentation maps are corrected after Step 3 using the estimated keypoints and depth map. The segmentation correction script will be updated soon.

Acknowledgements