Home

Awesome

Forks Stargazers Issues

HOIGen

Official code of ACM MM2024 paper- Unseen No More: Unlocking the Potential of CLIP for Generative Zero-shot HOI Detection.paper. 产品截图

Dataset

Follow the process of UPT.

The downloaded files should be placed as follows. Otherwise, please replace the default path to your custom locations.

|- HOIGen
|   |- hicodet
|   |   |- hico_20160224_det
|   |       |- annotations
|   |       |- images
:   :      

Dependencies

  1. Follow the environment setup in UPT.

  2. Our code is built upon CLIP. Install the local package of CLIP:

cd CLIP && python setup.py develop && cd ..
  1. Download the CLIP weights to checkpoints/pretrained_clip.
|- HOIGen
|   |- checkpoints
|   |   |- pretrained_clip
|   |       |- ViT-B-16.pt
:   :      
  1. Download the weights of DETR and put them in checkpoints/.
DatasetDETR weights
HICO-DETweights
|- HOIGen
|   |- checkpoints
|   |   |- detr-r50-hicodet.pth
:   :   :

Pre-extracted Features

Download the pre-extracted features from HERE. The downloaded files have to be placed as follows.

|- HOIGen
|   |- hicodet_pkl_files
|   |   |- union_embeddings_cachemodel_crop_padding_zeros_vitb16.p
:   :      

Training and Testing

Feature Generation

If you want to train the feature generator yourself, process the image and run the following code, otherwise load the weights we provide and put them in checkpoints/.

python main_coop_vae.py --data hoi_data/human_data/object_data
python finetune_ship.py --data hoi_data/human_data/object_data

HICO-DET

Fully-supervised:

python main_tip_finetune.py --world-size 1 --pretrained checkpoints/detr-r50-hicodet.pth --output-dir checkpoints/hico --use_insadapter --num_classes 117 --use_multi_hot --file1 hicodet_pkl_files/union_embeddings_cachemodel_crop_padding_zeros_vitb16.p --clip_dir_vit checkpoints/pretrained_clip/ViT-B-16.pt 
python main_tip_finetune.py --world-size 1 --pretrained checkpoints/detr-r50-hicodet.pth --output-dir checkpoints/hico --use_insadapter --num_classes 117 --use_multi_hot --file1 hicodet_pkl_files/union_embeddings_cachemodel_crop_padding_zeros_vitb16.p --clip_dir_vit checkpoints/pretrained_clip/ViT-B-16.pt --eval --resume CKPT_PATH

UC:

python main_tip_finetune.py --world-size 1 --pretrained checkpoints/detr-r50-hicodet.pth --output-dir checkpoints/hico --use_insadapter --num_classes 117 --use_multi_hot --file1 hicodet_pkl_files/union_embeddings_cachemodel_crop_padding_zeros_vitb16.p --clip_dir_vit checkpoints/pretrained_clip/ViT-B-16.pt --zs --zs_type uc0/uc1/uc2/uc3/uc4 --eval --resume CKPT_PATH
python main_tip_finetune.py --world-size 1 --pretrained checkpoints/detr-r50-hicodet.pth --output-dir checkpoints/hico --use_insadapter --num_classes 117 --use_multi_hot --file1 hicodet_pkl_files/union_embeddings_cachemodel_crop_padding_zeros_vitb16.p --clip_dir_vit checkpoints/pretrained_clip/ViT-B-16.pt --zs --zs_type uc0/uc1/uc2/uc3/uc4 --eval --resume CKPT_PATH

RF-UC:

python main_tip_finetune.py --world-size 1 --pretrained checkpoints/detr-r50-hicodet.pth --output-dir checkpoints/hico --use_insadapter --num_classes 117 --use_multi_hot --file1 hicodet_pkl_files/union_embeddings_cachemodel_crop_padding_zeros_vitb16.p --clip_dir_vit checkpoints/pretrained_clip/ViT-B-16.pt --zs --zs_type rare_first --eval --resume CKPT_PATH
python main_tip_finetune.py --world-size 1 --pretrained checkpoints/detr-r50-hicodet.pth --output-dir checkpoints/hico --use_insadapter --num_classes 117 --use_multi_hot --file1 hicodet_pkl_files/union_embeddings_cachemodel_crop_padding_zeros_vitb16.p --clip_dir_vit checkpoints/pretrained_clip/ViT-B-16.pt --zs --zs_type rare_first --eval --resume CKPT_PATH

NF-UC:

python main_tip_finetune.py --world-size 1 --pretrained checkpoints/detr-r50-hicodet.pth --output-dir checkpoints/hico --use_insadapter --num_classes 117 --use_multi_hot --file1 hicodet_pkl_files/union_embeddings_cachemodel_crop_padding_zeros_vitb16.p --clip_dir_vit checkpoints/pretrained_clip/ViT-B-16.pt --zs --zs_type non_rare_first --eval --resume CKPT_PATH
python main_tip_finetune.py --world-size 1 --pretrained checkpoints/detr-r50-hicodet.pth --output-dir checkpoints/hico --use_insadapter --num_classes 117 --use_multi_hot --file1 hicodet_pkl_files/union_embeddings_cachemodel_crop_padding_zeros_vitb16.p --clip_dir_vit checkpoints/pretrained_clip/ViT-B-16.pt --zs --zs_type non_rare_first --eval --resume CKPT_PATH

UV:

python main_tip_finetune.py --world-size 1 --pretrained checkpoints/detr-r50-hicodet.pth --output-dir checkpoints/hico --use_insadapter --num_classes 117 --use_multi_hot --file1 hicodet_pkl_files/union_embeddings_cachemodel_crop_padding_zeros_vitb16.p --clip_dir_vit checkpoints/pretrained_clip/ViT-B-16.pt --zs --zs_type unseen_verb --eval --resume CKPT_PATH
python main_tip_finetune.py --world-size 1 --pretrained checkpoints/detr-r50-hicodet.pth --output-dir checkpoints/hico --use_insadapter --num_classes 117 --use_multi_hot --file1 hicodet_pkl_files/union_embeddings_cachemodel_crop_padding_zeros_vitb16.p --clip_dir_vit checkpoints/pretrained_clip/ViT-B-16.pt --zs --zs_type unseen_verb --eval --resume CKPT_PATH

UO:

python main_tip_finetune.py --world-size 1 --pretrained checkpoints/detr-r50-hicodet.pth --output-dir checkpoints/hico --use_insadapter --num_classes 117 --use_multi_hot --file1 hicodet_pkl_files/union_embeddings_cachemodel_crop_padding_zeros_vitb16.p --clip_dir_vit checkpoints/pretrained_clip/ViT-B-16.pt --zs --zs_type unseen_object --eval --resume CKPT_PATH
python main_tip_finetune.py --world-size 1 --pretrained checkpoints/detr-r50-hicodet.pth --output-dir checkpoints/hico --use_insadapter --num_classes 117 --use_multi_hot --file1 hicodet_pkl_files/union_embeddings_cachemodel_crop_padding_zeros_vitb16.p --clip_dir_vit checkpoints/pretrained_clip/ViT-B-16.pt --zs --zs_type unseen_object --eval --resume CKPT_PATH

Model Zoo

SettingFullSeenUnseenWeights
UC33.4434.2330.26weights
RF-UC33.8634.5731.01weights
NF-UC33.0832.8633.98weights
UO33.4832.9036.35weights
UV32.3434.3120.27weights

Citation

If you find our paper and/or code helpful, please consider citing:

@inproceedings{
guo2024unseen,
title={Unseen No More: Unlocking the Potential of {CLIP} for Generative Zero-shot {HOI} Detection},
author={Yixin Guo and Yu Liu and Jianghao Li and Weimin Wang and Qi Jia},
booktitle={ACM Multimedia 2024},
year={2024},
url={https://openreview.net/forum?id=mAQ2fK2myX}
}

Acknowledgement

We gratefully thank the authors from UPT, ADA-CM, SHIP and CaFo for open-sourcing their code.

Tips

Since in order to open source the code as soon as possible, there is a lot of redundancy in the code and there will be some bugs, which I will update and fix in subsequent releases.

<!-- MARKDOWN 链接 & 图片 --> <!-- https://www.markdownguide.org/basic-syntax/#reference-style-links -->