Awesome
HOIGen
Official code of ACM MM2024 paper- Unseen No More: Unlocking the Potential of CLIP for Generative Zero-shot HOI Detection.paper.
Dataset
Follow the process of UPT.
The downloaded files should be placed as follows. Otherwise, please replace the default path to your custom locations.
|- HOIGen
| |- hicodet
| | |- hico_20160224_det
| | |- annotations
| | |- images
: :
Dependencies
-
Follow the environment setup in UPT.
-
Our code is built upon CLIP. Install the local package of CLIP:
cd CLIP && python setup.py develop && cd ..
- Download the CLIP weights to
checkpoints/pretrained_clip
.
|- HOIGen
| |- checkpoints
| | |- pretrained_clip
| | |- ViT-B-16.pt
: :
- Download the weights of DETR and put them in
checkpoints/
.
Dataset | DETR weights |
---|---|
HICO-DET | weights |
|- HOIGen
| |- checkpoints
| | |- detr-r50-hicodet.pth
: : :
Pre-extracted Features
Download the pre-extracted features from HERE. The downloaded files have to be placed as follows.
|- HOIGen
| |- hicodet_pkl_files
| | |- union_embeddings_cachemodel_crop_padding_zeros_vitb16.p
: :
Training and Testing
Feature Generation
If you want to train the feature generator yourself, process the image and run the following code, otherwise load the weights we provide and put them in checkpoints/
.
python main_coop_vae.py --data hoi_data/human_data/object_data
python finetune_ship.py --data hoi_data/human_data/object_data
HICO-DET
Fully-supervised:
python main_tip_finetune.py --world-size 1 --pretrained checkpoints/detr-r50-hicodet.pth --output-dir checkpoints/hico --use_insadapter --num_classes 117 --use_multi_hot --file1 hicodet_pkl_files/union_embeddings_cachemodel_crop_padding_zeros_vitb16.p --clip_dir_vit checkpoints/pretrained_clip/ViT-B-16.pt
python main_tip_finetune.py --world-size 1 --pretrained checkpoints/detr-r50-hicodet.pth --output-dir checkpoints/hico --use_insadapter --num_classes 117 --use_multi_hot --file1 hicodet_pkl_files/union_embeddings_cachemodel_crop_padding_zeros_vitb16.p --clip_dir_vit checkpoints/pretrained_clip/ViT-B-16.pt --eval --resume CKPT_PATH
UC:
python main_tip_finetune.py --world-size 1 --pretrained checkpoints/detr-r50-hicodet.pth --output-dir checkpoints/hico --use_insadapter --num_classes 117 --use_multi_hot --file1 hicodet_pkl_files/union_embeddings_cachemodel_crop_padding_zeros_vitb16.p --clip_dir_vit checkpoints/pretrained_clip/ViT-B-16.pt --zs --zs_type uc0/uc1/uc2/uc3/uc4 --eval --resume CKPT_PATH
python main_tip_finetune.py --world-size 1 --pretrained checkpoints/detr-r50-hicodet.pth --output-dir checkpoints/hico --use_insadapter --num_classes 117 --use_multi_hot --file1 hicodet_pkl_files/union_embeddings_cachemodel_crop_padding_zeros_vitb16.p --clip_dir_vit checkpoints/pretrained_clip/ViT-B-16.pt --zs --zs_type uc0/uc1/uc2/uc3/uc4 --eval --resume CKPT_PATH
RF-UC:
python main_tip_finetune.py --world-size 1 --pretrained checkpoints/detr-r50-hicodet.pth --output-dir checkpoints/hico --use_insadapter --num_classes 117 --use_multi_hot --file1 hicodet_pkl_files/union_embeddings_cachemodel_crop_padding_zeros_vitb16.p --clip_dir_vit checkpoints/pretrained_clip/ViT-B-16.pt --zs --zs_type rare_first --eval --resume CKPT_PATH
python main_tip_finetune.py --world-size 1 --pretrained checkpoints/detr-r50-hicodet.pth --output-dir checkpoints/hico --use_insadapter --num_classes 117 --use_multi_hot --file1 hicodet_pkl_files/union_embeddings_cachemodel_crop_padding_zeros_vitb16.p --clip_dir_vit checkpoints/pretrained_clip/ViT-B-16.pt --zs --zs_type rare_first --eval --resume CKPT_PATH
NF-UC:
python main_tip_finetune.py --world-size 1 --pretrained checkpoints/detr-r50-hicodet.pth --output-dir checkpoints/hico --use_insadapter --num_classes 117 --use_multi_hot --file1 hicodet_pkl_files/union_embeddings_cachemodel_crop_padding_zeros_vitb16.p --clip_dir_vit checkpoints/pretrained_clip/ViT-B-16.pt --zs --zs_type non_rare_first --eval --resume CKPT_PATH
python main_tip_finetune.py --world-size 1 --pretrained checkpoints/detr-r50-hicodet.pth --output-dir checkpoints/hico --use_insadapter --num_classes 117 --use_multi_hot --file1 hicodet_pkl_files/union_embeddings_cachemodel_crop_padding_zeros_vitb16.p --clip_dir_vit checkpoints/pretrained_clip/ViT-B-16.pt --zs --zs_type non_rare_first --eval --resume CKPT_PATH
UV:
python main_tip_finetune.py --world-size 1 --pretrained checkpoints/detr-r50-hicodet.pth --output-dir checkpoints/hico --use_insadapter --num_classes 117 --use_multi_hot --file1 hicodet_pkl_files/union_embeddings_cachemodel_crop_padding_zeros_vitb16.p --clip_dir_vit checkpoints/pretrained_clip/ViT-B-16.pt --zs --zs_type unseen_verb --eval --resume CKPT_PATH
python main_tip_finetune.py --world-size 1 --pretrained checkpoints/detr-r50-hicodet.pth --output-dir checkpoints/hico --use_insadapter --num_classes 117 --use_multi_hot --file1 hicodet_pkl_files/union_embeddings_cachemodel_crop_padding_zeros_vitb16.p --clip_dir_vit checkpoints/pretrained_clip/ViT-B-16.pt --zs --zs_type unseen_verb --eval --resume CKPT_PATH
UO:
python main_tip_finetune.py --world-size 1 --pretrained checkpoints/detr-r50-hicodet.pth --output-dir checkpoints/hico --use_insadapter --num_classes 117 --use_multi_hot --file1 hicodet_pkl_files/union_embeddings_cachemodel_crop_padding_zeros_vitb16.p --clip_dir_vit checkpoints/pretrained_clip/ViT-B-16.pt --zs --zs_type unseen_object --eval --resume CKPT_PATH
python main_tip_finetune.py --world-size 1 --pretrained checkpoints/detr-r50-hicodet.pth --output-dir checkpoints/hico --use_insadapter --num_classes 117 --use_multi_hot --file1 hicodet_pkl_files/union_embeddings_cachemodel_crop_padding_zeros_vitb16.p --clip_dir_vit checkpoints/pretrained_clip/ViT-B-16.pt --zs --zs_type unseen_object --eval --resume CKPT_PATH
Model Zoo
Setting | Full | Seen | Unseen | Weights |
---|---|---|---|---|
UC | 33.44 | 34.23 | 30.26 | weights |
RF-UC | 33.86 | 34.57 | 31.01 | weights |
NF-UC | 33.08 | 32.86 | 33.98 | weights |
UO | 33.48 | 32.90 | 36.35 | weights |
UV | 32.34 | 34.31 | 20.27 | weights |
Citation
If you find our paper and/or code helpful, please consider citing:
@inproceedings{
guo2024unseen,
title={Unseen No More: Unlocking the Potential of {CLIP} for Generative Zero-shot {HOI} Detection},
author={Yixin Guo and Yu Liu and Jianghao Li and Weimin Wang and Qi Jia},
booktitle={ACM Multimedia 2024},
year={2024},
url={https://openreview.net/forum?id=mAQ2fK2myX}
}
Acknowledgement
We gratefully thank the authors from UPT, ADA-CM, SHIP and CaFo for open-sourcing their code.
Tips
Since in order to open source the code as soon as possible, there is a lot of redundancy in the code and there will be some bugs, which I will update and fix in subsequent releases.
<!-- MARKDOWN 链接 & 图片 --> <!-- https://www.markdownguide.org/basic-syntax/#reference-style-links -->