Home

Awesome

EgoHOS

Project Page | Paper | Bibtex

<img src="https://github.com/owenzlz/EgoHOS/blob/main/demo/teaser.gif" style="width:800px;">

Fine-Grained Egocentric Hand-Object Segmentation: Dataset, Model, and Applications
European Conference on Computer Vision (ECCV), 2022
Lingzhi Zhang*, Shenghao Zhou*, Simon Stent, Jianbo Shi (* indicates equal contribution)

Our main goal is to provide a tool for better hand-object segmentation on the in-the-wild egocentric videos.

Prerequisites

Table of Contents:<br>

  1. Setup - download pretrained models and resources
  2. Datasets - download our egocentric hand-object segmentation datasets
  3. Checkpoints - download the checkpoints for all our models
  4. Inference on Images - quick usage on images
  5. Inference on Videos - quick usage on videos<br>
  6. Other Resources - other resources used in our papers<br>

Setup

git clone https://github.com/owenzlz/EgoHOS
pip install -r requirements.txt
pip install -U openmim
mim install mmcv-full==1.6.0
cd mmsegmentation
pip install -v -e .

For more information, please refer to MMSegmentation: https://mmsegmentation.readthedocs.io/en/latest/

<a name="datasets"/>

Datasets

bash download_datasets.sh

After downloading, the dataset is structured as follows:

- [egohos dataset root]
    |- train
        |- image
        |- label
        |- contact
    |- val 
        |- image
        |- label
        |- contact
    |- test_indomain
        |- image
        |- label
        |- contact
    |- test_outdomain
        |- image
        |- label
        |- contact

In each label image, the category ids are referred as below. In the contact labels, 'ones' indicate the dense contact region.

0 -> background
1 -> left hand
2 -> right hand
3 -> 1st order interacting object by left hand
4 -> 1st order interacting object by right hand
5 -> 1st order interacting object by both hands
6 -> 2nd order interacting object by left hand
7 -> 2nd order interacting object by right hand
8 -> 2nd order interacting object by both hands
<a name="checkpoints"/>

Checkpoints

bash download_checkpoints.sh
<a name="inference_on_images"/>

Inference on Images

bash download_testimages.sh

Depending on the application scenarios, you may want to use one of these commands to generate the segmentation predictions. Please modify the image directory paths in the bash file if needed. The backen segmentation model is Swin-L backbone with UPerNet head.

The default of the bash commands run on the images in "./testimages/images", and the results are saved in "./testimages" folder. If you wish to test on your own images, you may either put your images into "./testimages/images" folder or change directories in the bash files.

cd mmsegmentation # if you are not in this directory
bash pred_all_obj1.sh
<img src="https://github.com/owenzlz/EgoHOS/blob/main/demo/twohands_obj1_optimized.gif" style="width:850px;">
cd mmsegmentation # if you are not in this directory
bash pred_all_obj2.sh
<img src="https://github.com/owenzlz/EgoHOS/blob/main/demo/twohands_obj2_optimized.gif" style="width:850px;">

If you only want to predict only hand/contact segmentation, or want to use each module separately, see the commands below.

cd mmsegmentation # if you are not in this directory
bash pred_twohands.sh
<img src="https://github.com/owenzlz/EgoHOS/blob/main/demo/twohands_optimized.gif" style="width:850px;">
cd mmsegmentation # if you are not in this directory
bash pred_cb.sh
<img src="https://github.com/owenzlz/EgoHOS/blob/main/demo/cb.gif" style="width:850px;">
cd mmsegmentation # if you are not in this directory
bash pred_obj1.sh
cd mmsegmentation
bash pred_obj2.sh
<a name="inference_on_videos"/>

Inference on Videos

bash download_testvideos.sh
cd mmsegmentation # if you are not in this directory
bash pred_obj1_video.sh
cd mmsegmentation # if you are not in this directory
bash pred_obj2_video.sh
<a name="other_github"/>

Other Resouces

We used other resources for the application section, i.e. mesh reconstruction. Please refer to below:

  1. Image Inpainting - LaMa: https://github.com/saic-mdal/lama
  2. Video Inpainting - Flow-edge Guided Video Completion: https://github.com/vt-vl-lab/FGVC
  3. Mesh Reconstruction of Hand-Object Interaction: https://github.com/hassony2/homan
  4. Video Recognition - SlowFast Newtork: https://github.com/epic-kitchens/epic-kitchens-slowfast

If you wish to generate higher quality mask, you may consider using mask refinement model, i.e. Cascade PSP: https://github.com/hkchengrex/CascadePSP

Citation

If you use this code for your research, please cite our paper:

@inproceedings{zhang2022fine,
  title={Fine-Grained Egocentric Hand-Object Segmentation: Dataset, Model, and Applications},
  author={Zhang, Lingzhi and Zhou, Shenghao and Stent, Simon and Shi, Jianbo},
  booktitle={European Conference on Computer Vision},
  pages={127--145},
  year={2022},
  organization={Springer}
}