Awesome
[ICCV 2023] Efficient Adaptive Human-Object Interaction Detection with Concept-guided Memory
Dataset
Follow the process of UPT.
The downloaded files should be placed as follows. Otherwise, please replace the default path to your custom locations.
|- ADA-CM
| |- hicodet
| | |- hico_20160224_det
| | |- annotations
| | |- images
| |- vcoco
| | |- mscoco2014
| | |- train2014
| | |-val2014
: :
Dependencies
-
Follow the environment setup in UPT.
-
Our code is built upon CLIP. Install the local package of CLIP:
cd CLIP && python setup.py develop && cd ..
- Download the CLIP weights to
checkpoints/pretrained_clip
.
|- ADA-CM
| |- checkpoints
| | |- pretrained_clip
| | |- ViT-B-16.pt
| | |- ViT-L-14-336px.pt
: :
- Download the weights of DETR and put them in
checkpoints/
.
Dataset | DETR weights |
---|---|
HICO-DET | weights |
V-COCO | weights |
|- ADA-CM
| |- checkpoints
| | |- detr-r50-hicodet.pth
| | |- detr-r50-vcoco.pth
: : :
Pre-extracted Features
Download the pre-extracted features from HERE and the pre-extracted bboxes from HERE. The downloaded files have to be placed as follows.
|- ADA-CM
| |- hicodet_pkl_files
| | |- union_embeddings_cachemodel_crop_padding_zeros_vitb16.p
| | |- hicodet_union_embeddings_cachemodel_crop_padding_zeros_vit336.p
| | |- hicodet_train_bbox_R50.p
| | |- hicodet_test_bbox_R50.p
| |- vcoco_pkl_files
| | |- vcoco_union_embeddings_cachemodel_crop_padding_zeros_vit16.p
| | |- vcoco_union_embeddings_cachemodel_crop_padding_zeros_vit336.p
| | |- vcoco_train_bbox_R50.p
| | |- vcoco_test_bbox_R50.p
: :
TrainingFree Mode
HICO-DET
python main_tip_ye.py --world-size 1 --pretrained checkpoints/detr-r50-hicodet.pth --output-dir checkpoints/test --eval --post_process --use_multi_hot --logits_type HO+U+T --num_shot 8 --file1 hicodet_pkl_files/union_embeddings_cachemodel_crop_padding_zeros_vitb16.p --clip_dir_vit checkpoints/pretrained_clip/ViT-B-16.pt
V-COCO
Cache detection results for evaluation on V-COCO:
python main_tip_ye.py --world-size 1 --dataset vcoco --data-root vcoco/ --partitions trainval test --pretrained checkpoints/detr-r50-vcoco.pth --output-dir matlab/TF_vcoco/ --num-workers 4 --cache --post_process --dic_key verb --use_multi_hot --num_shot 8 --logits_type HO+U+T --file1 vcoco_pkl_files/vcoco_union_embeddings_cachemodel_crop_padding_zeros_vit16.p
For V-COCO, we did not implement evaluation utilities, and instead use the utilities provided by Gupta et al.. Refer to these instructions for more details.
FineTuning Mode
HICO-DET
Train on HICO-DET:
python main_tip_finetune.py --world-size 1 --pretrained checkpoints/detr-r50-hicodet.pth --output-dir checkpoints/hico --use_insadapter --num_classes 117 --use_multi_hot --file1 hicodet_pkl_files/union_embeddings_cachemodel_crop_padding_zeros_vitb16.p --clip_dir_vit checkpoints/pretrained_clip/ViT-B-16.pt
Test on HICO-DET:
python main_tip_finetune.py --world-size 1 --pretrained checkpoints/detr-r50-hicodet.pth --output-dir checkpoints/hico --use_insadapter --num_classes 117 --use_multi_hot --file1 hicodet_pkl_files/union_embeddings_cachemodel_crop_padding_zeros_vitb16.p --clip_dir_vit checkpoints/pretrained_clip/ViT-B-16.pt --eval --resume CKPT_PATH
V-COCO
Training on V-COCO
python main_tip_finetune.py --world-size 1 --dataset vcoco --data-root vcoco/ --partitions trainval test --pretrained checkpoints/detr-r50-vcoco.pth --output-dir checkpoints/vcoco-injector-r50 --use_insadapter --num_classes 24 --use_multi_hot --file1 vcoco_pkl_files/vcoco_union_embeddings_cachemodel_crop_padding_zeros_vit16.p --clip_dir_vit checkpoints/pretrained_clip/ViT-B-16.pt
Cache detection results for evaluation on V-COCO
python main_tip_finetune.py --world-size 1 --dataset vcoco --data-root vcoco/ --partitions trainval test --pretrained checkpoints/detr-r50-vcoco.pth --output-dir checkpoints/vcoco-injector-r50 --use_insadapter --num_classes 24 --use_multi_hot --file1 vcoco_pkl_files/vcoco_union_embeddings_cachemodel_crop_padding_zeros_vit16.p --clip_dir_vit checkpoints/pretrained_clip/ViT-B-16.pt --cache --resume CKPT_PATH
Model Zoo
Dataset | Backbone | mAP | Rare | Non-rare | Weights |
---|---|---|---|---|---|
HICO-DET | ResNet-50+ViT-B | 33.80 | 31.72 | 34.42 | weights |
HICO-DET | ResNet-50+ViT-L | 38.40 | 37.52 | 38.66 | weights |
Dataset | Backbone | Scenario 1 | Scenario 2 | Weights |
---|---|---|---|---|
V-COCO | ResNet-50+ViT-B | 56.12 | 61.45 | weights |
V-COCO | ResNet-50+ViT-L | 58.57 | 63.97 | weights |
Citation
If you find our paper and/or code helpful, please consider citing:
@article{ting2023hoi,
title={Efficient Adaptive Human-Object Interaction Detection with Concept-guided Memory},
author={Ting Lei and Fabian Caba and Qingchao Chen and Hailin Ji and Yuxin Peng and Yang Liu},
year={2023},
booktitle={ICCV},
organization={IEEE},
}
Acknowledgement
We gratefully thank the authors from UPT for open-sourcing their code.