

Weakly Supervised Human-Object Interaction Detection in Video via Contrastive Spatiotemporal Regions

This code is mainly based on our ICCV 2021 paper Weakly Supervised Human-Object Interaction Detection in Video via Contrastive Spatiotemporal Regions.

Dataset webpage: https://shuangli-project.github.io/VHICO-Dataset/ <br> Project webpage: https://shuangli-project.github.io/weakly-supervised-human-object-detection-video/

This project aims at weakly supervised human-object interaction detection in videos. We introduce a contrastive weakly supervised training loss that aims to jointly associate spatiotemporal regions in a video with an action and object vocabulary and encourage temporal continuity of the visual appearance of moving objects as a form of self-supervision.

To train our model, we introduce a dataset comprising over 6.5k videos with human-object interaction annotations that have been semi-automatically curated from sentence captions associated with the videos.

Install packages

conda install pytorch=0.4.1 cuda90 -c pytorch
pip install cython
pip install numpy scipy pyyaml packaging pycocotools tensorboardX tqdm scikit-image gensim
pip install opencv-python
pip uninstall matplotlib
conda install -c conda-forge matplotlib
pip uninstall pillow
conda install -c anaconda pil

V-HICO Dataset


Because of licence issues, please download the corresponding videos from Moments in Time Dataset. The data we used is from their extract frames with the folder name video_256_30fps. For more information about our dataset, please visit the dataset website.

<!-- ### Data labelling tool We used [LabelImg](https://github.com/tzutalin/labelImg) to annotate the human and object bounding boxes of video frames from the test set and the unseen test set. -->


Please download the human annotations and saved results first.

Please unzip the human annotations and put them in the data folder. <br> Please unzip the saved results and put them in the results folder.

Test set

mAP: python eval/eval_vhico.py --eval_subset test --EVAL_MAP 1
Recall: python eval/eval_vhico.py --eval_subset test --EVAL_MAP 0

Unseen test set

mAP: python eval/eval_vhico.py --eval_subset unseen --EVAL_MAP 1
Recall: python eval/eval_vhico.py --eval_subset unseen --EVAL_MAP 0

Training and Testing

Prepare Data:

To train and test our model, please run the following codes:


sh scripts/train_rel_mit.sh 


sh test_rel_mit.sh