Awesome

Interaction Compass: Multi-Label Zero-Shot Learning of Human-Object Interactions via Spatial Relations

Overview

This repository contains the implementation of Multi-Label Zero-Shot Learning of Human-Object Interactions via Spatial Relations.

In this work, we develope a compositional model to recognize unseen human interactions based on spatial relations between human and objects.

Prerequisites

To install all the dependency packages, please run:

pip install -r requirements.txt

Data Preparation

Please download and extract information into the ./data folder. We include details about download links as well as what are they used for in each folder within `./data' folder.
Please run feature extraction scripts in ./extract_feature folder to extract features from the last convolution layers of ResNet as region features for the attention mechanism:

python ./extract_feature/hico_det/hico_det_extract_feature_map_ResNet_152_padding.py				                                    #create ./data/DeepFashion/feature_map_ResNet_101_DeepFashion_sep_seen_samples.hdf5
python ./extract_feature/visual_genome/visual_genome_extract_feature_map_ResNet_152_padding.py						            #create ./data/AWA2/feature_map_ResNet_101_AWA2.hdf5

as well as word embedding for zero-shot learning:

python ./extract_feature/hico_det/hico_extract_action_object_w2v.py						                                                                  #create ./data/CUB/feature_map_ResNet_101_CUB.hdf5
python ./extract_feature/visual_genome/visual_genome_extract_action_object_w2v.py						                                       #create ./data/SUN/feature_map_ResNet_101_SUN.hdf5

Experiments

To train cross attention on HICO/VG datasets under different training splits (1A/1A2B), please run the following commands:

# HICO experiments
python ./experiments/hico_det/hico_det_pad_CrossAttention.py --partition train_1A --idx_GPU 1 --save_folder ./results/HICO_1A --mll_k_3 7 --mll_k_5 10 --loc_k 10             #1A setting
python ./experiments/hico_det/hico_det_pad_CrossAttention.py --partition train_1A2B --idx_GPU 2 --save_folder ./results/HICO_1A2B --mll_k_3 7 --mll_k_5 10 --loc_k 10     #1A2B setting

# Visual Genome experiments
python ./experiments/visual_genome_pad/VG_pad_CrossAttention.py --partition train_1A --idx_GPU 4 --save_folder ./results/VG_1A --mll_k_3 7 --mll_k_5 10               #1A setting
python ./experiments/visual_genome_pad/VG_pad_CrossAttention.py --partition train_1A2B --idx_GPU 5 --save_folder ./results/VG_1A2B --mll_k_3 7 --mll_k_5 10       #1A2B setting

Pretrained Models

For ease of reproducing the results, we provided the pretrained models for:

Dataset	Setting	Model
HICO	1A2B	download
HICO	1A	download
VisualGenome	1A2B	download
VisualGenome	1A	download

To evaluate pretrained model, please run:

python ./experiments/hico_det/hico_det_pad_CrossAttention.py --partition train_1A2B --idx_GPU 0 --save_folder ./results/HICO_1A2B --mll_k_3 7 --mll_k_5 10 --loc_k  --load_model ./pretrained_model/model_final_HICO_1A2B.pt

where ./pretrained_model/model_final_HICO_1A2B.pt is the path of the corresponding pretrained model.

Citation

If you find the project helpful, we would appreciate if you cite the works:

@article{Huynh:ICCV21,
  author = {D.~Huynh and E.~Elhamifar},
  title = {Interaction Compass: Multi-Label Zero-Shot Learning of Human-Object Interactions via Spatial Relations},
  journal = {International Conference on Computer Vision},
  year = {2021}}