Awesome

HIU-DMTL

This is the official code for paper Hand Image Understanding via Deep Multi-Task Learning. The the pre-trained model will be released soon. Thank you for your attention.

Analyzing and understanding hand information from multimedia materials like images or videos is important for many real world applications and remains active in research community. There are various works focusing on recovering hand information from single image, however, they usually solve a single task, for example, hand mask segmentation, 2D/3D hand pose estimation, or hand mesh reconstruction and perform not well in challenging scenarios. To further improve the performance of these tasks, we propose a novel Hand Image Understanding (HIU) framework to extract comprehensive information of the hand object from a single RGB image, by jointly considering the relationships between these tasks. To achieve this goal, a cascaded multitask learning (MTL) backbone is designed to estimate the 2D heat maps, to learn the segmentation mask, and to generate the intermediate 3D information encoding, followed by a coarse-to-fine learning paradigm and a self-supervised learning strategy

Demo Video.

We present three videos to illustrate the HIU-DMTL framework, including example speech, example dance, and in the wild video.

The new dataset.

You may download it from baidu netdisk with code utz8 or google drive. Note that, the hiu_dmtl_data.zip has been encrypted, please email to me to get the password.

Citation

If you use this code/dataset for your research, please cite:

@inproceedings{zhang2021hand,
  title={Hand Image Understanding via Deep Multi-Task Learning},
  author={Zhang, Xiong and Huang, Hongsheng and Tan, Jianchao and Xu, Hongmin and Yang, Cheng and Peng, Guozhu and Wang, Lei and Liu, Ji},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
  pages={11281--11292},
  year={2021}
}

@inproceedings{zhang2019end,
  title={End-to-end hand mesh recovery from a monocular rgb image},
  author={Zhang, Xiong and Li, Qiang and Mo, Hong and Zhang, Wenbo and Zheng, Wen},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
  pages={2354--2364},
  year={2019}
}