Awesome
HOICLIP: Efficient-Knowledge-Transfer-for-HOI-Detection-with-Visual-Linguistic-Model
Code for our CVPR 2023 paper "HOICLIP: Efficient-Knowledge-Transfer-for-HOI-Detection-with-Visual-Linguistic-Model" .
Contributed by Shan Ning*, Longtian Qiu*, Yongfei Liu, Xuming He.
Installation
Install the dependencies.
pip install -r requirements.txt
Data preparation
HICO-DET
HICO-DET dataset can be downloaded here. After
finishing downloading, unpack the tarball (hico_20160224_det.tar.gz
) to the data
directory.
Instead of using the original annotations files, we use the annotation files provided by the PPDM authors. The
annotation files can be downloaded from here. The
downloaded annotation files have to be placed as follows.
For fractional data setting, we provide the
annotations here. After
decompress, the files should be placed under data/hico_20160224_det/annotations
.
data
└─ hico_20160224_det
|─ annotations
| |─ trainval_hico.json
| |─ test_hico.json
| |─ corre_hico.json
| |─ trainval_hico_5%.json
| |─ trainval_hico_15%.json
| |─ trainval_hico_25%.json
| └─ trainval_hico_50%.json
:
V-COCO
First clone the repository of V-COCO from here, and then follow the instruction to
generate the file instances_vcoco_all_2014.json
. Next, download the prior file prior.pickle
from here. Place the files and make
directories as follows.
GEN-VLKT
|─ data
│ └─ v-coco
| |─ data
| | |─ instances_vcoco_all_2014.json
| | :
| |─ prior.pickle
| |─ images
| | |─ train2014
| | | |─ COCO_train2014_000000000009.jpg
| | | :
| | └─ val2014
| | |─ COCO_val2014_000000000042.jpg
| | :
| |─ annotations
: :
For our implementation, the annotation file have to be converted to the HOIA format. The conversion can be conducted as follows.
PYTHONPATH=data/v-coco \
python convert_vcoco_annotations.py \
--load_path data/v-coco/data \
--prior_path data/v-coco/prior.pickle \
--save_path data/v-coco/annotations
Note that only Python2 can be used for this conversion because vsrl_utils.py
in the v-coco repository shows a error
with Python3.
V-COCO annotations with the HOIA format, corre_vcoco.npy
, test_vcoco.json
, and trainval_vcoco.json
will be
generated to annotations
directory.
Pre-trained model
Download the pretrained model of DETR detector for ResNet50
, and put it to the params
directory.
python ./tools/convert_parameters.py \
--load_path params/detr-r50-e632da11.pth \
--save_path params/detr-r50-pre-2branch-hico.pth \
--num_queries 64
python ./tools/convert_parameters.py \
--load_path params/detr-r50-e632da11.pth \
--save_path params/detr-r50-pre-2branch-vcoco.pth \
--dataset vcoco \
--num_queries 64
Training
After the preparation, you can start training with the following commands.
HICO-DET
# default setting
sh ./scripts/train_hico.sh
V-COCO
sh ./scripts/train_vcoco.sh
Zero-shot
# rare first unseen combination setting
sh ./scripts/train_hico_rf_uc.sh
# non rare first unseen combination setting
sh ./scripts/train_hico_nrf_uc.sh
# unseen object setting
sh ./scripts/train_hico_uo.sh
# unseen verb setting
sh ./scripts/train_hico_uv.sh
Fractional data
# 50% fractional data
sh ./scripts/train_hico_frac.sh
Generate verb representation for Visual Semantic Arithmetic
sh ./scripts/generate_verb.sh
We provide the generated verb representation in ./tmp/verb.pth
for hico and ./tmp/vcoco_verb.pth
for vcoco.
Evaluation
HICO-DET
You can conduct the evaluation with trained parameters for HICO-DET as follows.
python -m torch.distributed.launch \
--nproc_per_node=2 \
--use_env \
main.py \
--pretrained [path to your checkpoint] \
--dataset_file hico \
--hoi_path data/hico_20160224_det \
--num_obj_classes 80 \
--num_verb_classes 117 \
--backbone resnet50 \
--num_queries 64 \
--dec_layers 3 \
--eval \
--zero_shot_type default \
--with_clip_label \
--with_obj_clip_label \
--use_nms_filter
For the official evaluation (reported in paper), you need to covert the prediction file to an official prediction format following this file, and then follow PPDM evaluation steps.
//: # (python generate_vcoco_official.py )
//: # ( --param_path pretrained/VCOCO_GEN_VLKT_S.pth )
//: # ( --save_path vcoco.pickle )
//: # ( --hoi_path data/v-coco )
//: # ( --num_queries 64 )
//: # ( --dec_layers 3 )
//: # ( --use_nms_filter )
//: # ( --with_clip_label )
Zero-shot
python -m torch.distributed.launch \
--nproc_per_node=8 \
--use_env \
main.py \
--pretrained [path to your checkpoint] \
--dataset_file hico \
--hoi_path data/hico_20160224_det \
--num_obj_classes 80 \
--num_verb_classes 117 \
--backbone resnet50 \
--num_queries 64 \
--dec_layers 3 \
--eval \
--with_clip_label \
--with_obj_clip_label \
--use_nms_filter \
--zero_shot_type rare_first \
--del_unseen
Training Free Enhancement
The Training Free Enhancement
is used when args.training_free_enhancement_path is not empty.
The results are placed in args.output_dir/args.training_free_enhancement_path.
You may refer to codes in engine.py:202
.
By default, we set the topk to [10, 20, 30, 40, 50].
Visualization
Script for visualization is in scripts/visualization_hico.sh
You may need to adjust the file paths with TODO comment in visualization_hoiclip/gen_vlkt.py
and currently the code
visualize fail cases in some zero-shot setting. For detail information, you may refer to the comments.
Regular HOI Detection Results
HICO-DET
Full (D) | Rare (D) | Non-rare (D) | Full(KO) | Rare (KO) | Non-rare (KO) | Download | Conifg | |
---|---|---|---|---|---|---|---|---|
HOICLIP | 34.69 | 31.12 | 35.74 | 37.61 | 34.47 | 38.54 | model | config |
D: Default, KO: Known object. The best result is achieved with training free enhancement (topk=10).
HICO-DET Fractional Setting
Fractional | Full | Rare | Non-rare | Config | |
---|---|---|---|---|---|
HOICLIP | 5% | 22.64 | 21.94 | 24.28 | config |
HOICLIP | 15% | 27.07 | 24.59 | 29.38 | config |
HOICLIP | 25% | 28.44 | 25.47 | 30.52 | config |
HOICLIP | 50% | 30.88 | 26.05 | 32.97 | config |
You may need to change the --frac [portion]%
in the scripts.
V-COCO
Scenario 1 | Scenario 2 | Download | Config | |
---|---|---|---|---|
HOICLIP | 63.50 | 64.81 | model | config |
Zero-shot HOI Detection Results
Type | Unseen | Seen | Full | Download | Conifg | |
---|---|---|---|---|---|---|
HOICLIP | RF-UC | 25.53 | 34.85 | 32.99 | model | config |
HOICLIP | NF-UC | 26.39 | 28.10 | 27.75 | model | config |
HOICLIP | UO | 16.20 | 30.99 | 28.53 | model | config |
HOICLIP | UV | 24.30 | 32.19 | 31.09 | model | config |
We also provide the checkpoints for uc0, uc1, uc2, uc3 settings in Google Drive
Citation
Please consider citing our paper if it helps your research.
@inproceedings{ning2023hoiclip,
title={HOICLIP: Efficient Knowledge Transfer for HOI Detection with Vision-Language Models},
author={Ning, Shan and Qiu, Longtian and Liu, Yongfei and He, Xuming},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={23507--23517},
year={2023}
}
Acknowledge
Codes are built from GEN-VLKT, PPDM , DETR, QPIC and CDN. We thank them for their contributions.
Release Schedule
- Update raw codes(2023/4/14)
- Update readme(2023/7/26)
- Data(2023/7/26)
- Scripts(2023/7/26)
- Performance table(2023/7/26)
- Others(2023/7/26)
- Release trained checkpoints(2023/7/26)
- Default settings(2023/7/26)
- Zero-shot settings(2023/7/26)
- Fractional settings(2023/7/26)
- Clean up codes(2023/7/26)