Home

Awesome

No-Frills Human-Object Interaction Detection: Factorization, Layout Encodings, and Training Techniques

By Tanmay Gupta, Alexander Schwing, and Derek Hoiem

<p align="center"> <img src="imgs/teaser_wide.png"> </p>

Content

Overview

This repository provides code to train and evaluate an HOI Detection model (implemented in Pytorch) that demonstrates strong performance on the challenging HICO-Det benchmark...that too without frills!

Why do we call it a no-frills model?

We make several simplifications over existing approaches while achieving better performance owing to our choice of factorization, direct encoding and scoring of layout, and improved training techniques. Here are the key simplifications:

Available on Arxiv: https://arxiv.org/abs/1811.05967

BibTex:

@article{gupta2018nofrills,
  title={No-Frills Human-Object Interaction Detection: Factorization, Layout Encodings, and Training Techniques},
  author={Gupta, Tanmay and Schwing, Alexander and Hoiem, Derek},
  journal={arXiv preprint arXiv:1811.05967},
  year={2018}
}

References:

[1] Chao, Y., Liu, Y., Liu, X., Zeng, H., & Deng, J. (2018). Learning to Detect Human-Object Interactions. 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), 381-389.

[2] Gkioxari, G., Girshick, R.B., Dollár, P., & He, K. (2018). Detecting and Recognizing Human-Object Interactions. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8359-8367.

[3] Gao, C., Zou, Y., & Huang, J. (2018). iCAN: Instance-Centric Attention Network for Human-Object Interaction Detection. BMVC.

[4] Qi, S., Wang, W., Jia, B., Shen, J., & Zhu, S. (2018). Learning Human-Object Interactions by Graph Parsing Neural Networks. ECCV.

Requirements

All dependencies will be installed in a python3 virtual environment.

Step 1: Create a python virtual environment

virtualenv -p python3.6 <path_to_new_virtual_env>

Step 2: Activate the environment

source <path_to_new_virtual_env>/bin/activate

Step 3: Install the dependencies

Here are the main requirements:

Most of these can be installed with pip. There might be other dependencies which could be installed when prompted. Please check pip_freeze_dump.txt for version numbers before proceeding to install. Note that this txt file lists a lot more packages than needed (all packages installed in my virtual environment).

Important: This repository supports Pytorch 0.3.1 and is not compatible with version 0.4 or higher because of various breaking changes introduced in these versions like unification of Tensor and Variable, Scalar Tensors etc. See Pytorch Migration Guide for details if you would like to port the implementation to a newer version of Pytorch.

Setup

We will be executing all commands inside the root directory (.../no_frills_hoi_det/) that was created when you cloned the repository.

To begin, we will create a directory in the root directory called data_symlinks that would contain symlinks to any data to be used or produced by our code. Specifically we will create 3 symlinks:

Creating these symlinks is useful if your hardware setup constrains where you keep your data. For example, if you want to store the dataset on the local drives, and code, processed files, and experiment data on the NFS to be shared across multiple servers

mkdir data_symlinks
cd data_symlinks
ln -s <path to hico_clean> ./hico_clean
ln -s <path to hico_processed> ./hico_processed
ln -s <path to hico_exp> ./hico_exp

If executed correctly, ls -l data_symlinks in the root directory should show something like:

hico_clean -> /data/tanmay/hico/hico_det_clean_20160224
hico_exp -> /home/nfs/tgupta6/Code/hoi_det_data/hico_exp
hico_processed -> /home/nfs/tgupta6/Code/hoi_det_data/hico_processed

Download the HICO-Det dataset

We will now download the required data from the HICO-Det website to hico_clean. Here are the links to all the files (version 0160224) you would need to download

Extract the images and annotations file which will be download as a tar.gz file using

tar xvzf <path to tar.gz file> -C <path to hico_clean directory>

Here -C flag specifies the target location where the files will be extracted.

After this step output of ls -l data_symlinks/hico_clean should look like

anno_bbox.mat
anno.mat
hico_list_hoi.txt
hico_list_obj.txt
hico_list_vb.txt
images
README
tools

Process HICO-Det files

The HICO-Det dataset consists of images and annotations stored in the form of .mat and .txt files. Run the following command to quickly convert this data into easy to understand json files which will be written to hico_processed directory

bash data/hico/process.sh

In addition, the process.sh performs the following functions:

The splits are needed for both training and evaluation. Class counts are needed only for evaluation to compute mAP of group of HOI classes created based on number of available training examples.

Run Object Detector (or download the detections we provide)

Download

Create your own

Step 1: Prepare data for running faster-rcnn

python -m exp.detect_coco_objects.run --exp exp_detect_coco_objects_in_hico

This creates faster_rcnn_im_in_out.json file in hico_exp/detect_coco_objects_in_hico which specifies paths to the images on which to run the detector and the paths to the target location where the results would be saved.

For each image with unique <global_id> the object detector writes the following to hico_processed/faster_rcnn_boxes:

Step 2: Run faster-rcnn

This step requires faster_rcnn_im_in_out.json file created in the previous step. I have created a fork of a popular Faster-RCNN pytorch implementation. This fork includes a script that takes the json file and writes the outputs to hico_processed in the required format. Please follow installation and execution instructions at https://github.com/BigRedT/pytorch-faster-rcnn

Once the faster-rcnn features are saved, we will write them all to a single hdf5 file using:

python -m exp.hoi_classifier.data.write_faster_rcnn_feats_to_hdf5

Step 3: Select candidate boxes for each object category from all predictions

For each image, Faster-RCNN predicts class scores (for 80 COCO classes) and box regression offsets (per class) for upto 300 regions. In this step, for each COCO class, we select upto 10 high confidence predictions per class after non-max suppression by running the following:

python -m exp.detect_coco_objects.run --exp exp_select_and_evaluate_confident_boxes_in_hico

This will create an hdf5 file called selected_coco_cls_dets.h5py in hico_exp/select_confident_boxes_in_hico directory. More details about the structure of this file can be found here.

The above command also performs a recall based evaluation of the object detections to see what fraction of ground truth human and object annotations are recalled by these predictions. These stats are written to the following files in the same directory:

As shown in the table below, our run resulted on average 6 human and 94 object (all categories except human) candidate detections per image. eval_stats_boxes_labels.json shows recall of the labelled detections (for example only the 6 human candidates would be used as predictions to compute human recall), where as eval_stats_boxes.json shows recall values if all 100 (6+94) detections were considered as predictions for every human/object category irrespective of the labels predicted by faster-rcnn. Let us refer to this case as unlabelled. The idea is that using labels predicted by the detector helps prune out candidates to be examined for any HOI category by limiting box-pair candidates to those constructed from detections for human and the object category involved in the said HOI category. The significant reduction in candidates (and hence reduced computation for our model and likely false positives) is at the expense of a reasonable and expected drop in object recall due to detections missed by the object detector and our selection scheme.

LabelledUnlabelled
Avg. Human Candidates6100 (=6+94)
Avg. Object Candidates1.2 (=94/79)100
Avg. Connection / Pair Candidates0.9 (=568/600)16.9 =(10122/600)
Human Recall86%89%
Object Recall70%86%
Connection / Pair Recall59%77%

Run Human Pose Detector (or download the poses we provide)

Download

Create your own

We will use OpenPose to extract human pose. We used the official implementation available at https://github.com/CMU-Perceptual-Computing-Lab/openpose. We used the latest version of the master branch of OpenPose available at the time (commit: f430a79). The branch has moved ahead by several commits since then and we haven't tested the latest version (though I would assume it works too).

To extract pose run the following replacing <subset> by train2015 and test2015.

./build/examples/openpose/openpose.bin \
    --image_dir <path to hico_clean>/images/<subset>/ \
    --face \
    --hand \
    --write_json <path to hico_processed>/human_pose/<subset> \
    --display 0

Train HOI classifier

Step 1: Generate HOI candidates from object candidates and cache Box and Pose features

We provide a simple bash script for this:

bash exp/hoi_classifier/scripts/preprocess.sh

This generates the following files in hico_exp/hoi_candidates directory:

Step 2: Train the model

Modify flags in exp/hoi_classifier/scripts/train.sh as required and run:

bash bash exp/hoi_classifier/scripts/train.sh <GPU ID>

<GPU ID> specifies the GPU to use for training the model.

This creates a directory called factors_rcnn_det_prob_appearance_boxes_and_object_label_human_pose in hico_exp/hoi_classifier. The name of the directory is automatically constructed based on the factors used in this model. The factors are enabled using appropriate flags in the train.sh file.

This directory is used to store the following:

View Tensorboard Logs

tensorboard --logdir=./data_symlinks/hico_exp/hoi_classifier

Time and Memory

Evaluate Model

Step 1: Select the model to evaluate

The model can be selected based on the validation loss logged in tensorboard and is usually around 30000 iterations. Let us call the selected iteration <MODEL NUM>

val_loss.png

Step 2: Make predictions for the test set

bash exp/hoi_classifier/scripts/eval.sh <GPU ID> <MODEL NUM>

This generates a pred_hoi_dets.hdf5 file. The structure of the file is detailed here

Step 3: Compute mAPs

This is done by the compute_map.sh script in exp/hico_eval directory. Update variable EXP_NAME to the one you want to evaluate and MODEL_NUM to the selected <MODEL NUM> and run

bash exp/hico_eval/compute_map.sh

EXP_NAME defaults to factors_rcnn_det_prob_appearance_boxes_and_object_label_human_pose which is the model trained with all factors. The APs for each HOI category and overall performance are stored in the experiment directory with a relative path that looks like mAP_eval/test_<MODEL NUM>/mAP.json

The mAP for the provided model for various category groups (based on number of training samples) are as follows:

ModelFullRareNon-Rare0-910-4950-99100-499500-9991000-9999
HO-RCNN [1]7.815.378.54------
InteractNet [2]9.947.1610.77------
GPNN [4]13.119.3414.23------
iCAN [3]14.8410.4516.15------
No-Frills17.0711.518.7411.511.6314.5721.8525.4241.54

Step 4: Visualize

Top Ranking Detections

To visualize top ranking detections (as shown below) for each of the 600 HOI categories:

bash exp/hoi_classifier/scripts/visualize.sh <MODEL NUM>

This generates vis/top_boxes_per_hoi_wo_inference/<hoi_id>_<verb>_<object>/ directory for each HOI inside the training experiment directory. Each such directory contains a few .png files and index.html that visualizes the .png files when opened in a browser.

Top ranking detections for ride bike (see imgs/019_ride_bicycle)

ride_bike_snapshot.png

Top ranking detections for lie on couch (see imgs/094_lie_on_couch)

lie_on_couch_snapshot.png

Performance distribution

To visualize spread of performance across different interactions with the same object run:

python -m exp.hoi_classifier.vis.vis_interaction_aps_per_object

interaction_aps_per_object_snapshot.png

An interactive version of the plot is available at imgs/interaction_aps_per_object.html.

To visualize spread of performance across different objects for the same interaction run:

python -m exp.hoi_classifier.vis.vis_object_aps_per_interaction

object_aps_per_interaction_snapshot.png

An interactive version of the plot is available at imgs/object_aps_per_interaction.html.

Pretrained model