Awesome
Video Relationship Reasoning using Gated Spatio-Temporal Energy Graph
Pytorch implementation for learning an observation-Gated Spatio-Temporal Energy Graph for Video Relationship Reasoning on Charades dataset.
Contact: Yao-Hung Hubert Tsai (yaohungt@cs.cmu.edu)
Paper
Video Relationship Reasoning using Gated Spatio-Temporal Energy Graph<br> Yao-Hung Hubert Tsai, Santosh Divvala, Louis-Philippe Morency, Ruslan Salakhutdinov and Ali Farhadi<br> Computer Vision and Pattern Recognition (CVPR), 2019.
Please cite our paper if you find the code, dataset, or the experimental setting useful for your research.
@inproceedings{tsai2019GSTEG,
title={Video Relationship Reasoning using Gated Spatio-Temporal Energy Graph},
author={Tsai, Yao-Hung Hubert and Divvala, Santosh and Morency, Louis-Philippe and Salakhutdinov, Ruslan and Farhadi, Ali},
booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2019}
}
Overview
Relationship Reasoning in Videos
<p align="center"> <img src='imgs/vidvrd.png' width="400px"/>Visual relationship reasoning in images (top) vs. videos (bottom): Given a single image, it is ambiguous whether the monkey is creeping up or down the car. Using a video not only helps to unambiguously recognize a richer set of relations, but also model temporal correlations across them (e.g., creep down and jump left).
Gated Spatio-Temporal Energy Graph
<p align="center"> <img src='imgs/GSTEG.png' width="1000px"/>An overview of our Proposed Gated Spatio-Temporal Energy Graph. Given an input instance (a video clip), we predict the output relationships (e.g., {monkey, creep down, car}, etc.,) by reasoning over a fully-connected spatio-temporal graph with nodes S (Subject),P (Predicate) and O (Object). Instead of assuming a non-gated (i.e., predefined or globally-learned) pairwise energy function, we explore the use of gated energy functions (i.e., conditioned on the specific visual observation).
Usage
Prerequisites
- Python 3.6
- Pytorch and torchvision
Datasets
Pretrained Model
- Download the pretrained (with Kinetics Dataset) I3D model here. Note that I removed the last classifier layer, and append a new classifier layer for Charades.
Run the Code
- Modify exp/GSTEG.py
- Create the cache directory
- Specify the location of the data, training/validation split, and pretrained model.
- Command as follows
python3 exp/GSTEG.py
Acknowledgement
A large portion of the code comes from the Temporal Fields, VidVRD, and ImageNet repo.