Home

Awesome

Python 3.6

Video Relationship Reasoning using Gated Spatio-Temporal Energy Graph

Pytorch implementation for learning an observation-Gated Spatio-Temporal Energy Graph for Video Relationship Reasoning on Charades dataset.

Contact: Yao-Hung Hubert Tsai (yaohungt@cs.cmu.edu)

Paper

Video Relationship Reasoning using Gated Spatio-Temporal Energy Graph<br> Yao-Hung Hubert Tsai, Santosh Divvala, Louis-Philippe Morency, Ruslan Salakhutdinov and Ali Farhadi<br> Computer Vision and Pattern Recognition (CVPR), 2019.

Please cite our paper if you find the code, dataset, or the experimental setting useful for your research.

@inproceedings{tsai2019GSTEG,
  title={Video Relationship Reasoning using Gated Spatio-Temporal Energy Graph},
  author={Tsai, Yao-Hung Hubert and Divvala, Santosh and Morency, Louis-Philippe and Salakhutdinov, Ruslan and Farhadi, Ali},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2019}
}

Overview

Relationship Reasoning in Videos

<p align="center"> <img src='imgs/vidvrd.png' width="400px"/>

Visual relationship reasoning in images (top) vs. videos (bottom): Given a single image, it is ambiguous whether the monkey is creeping up or down the car. Using a video not only helps to unambiguously recognize a richer set of relations, but also model temporal correlations across them (e.g., creep down and jump left).

Gated Spatio-Temporal Energy Graph

<p align="center"> <img src='imgs/GSTEG.png' width="1000px"/>

An overview of our Proposed Gated Spatio-Temporal Energy Graph. Given an input instance (a video clip), we predict the output relationships (e.g., {monkey, creep down, car}, etc.,) by reasoning over a fully-connected spatio-temporal graph with nodes S (Subject),P (Predicate) and O (Object). Instead of assuming a non-gated (i.e., predefined or globally-learned) pairwise energy function, we explore the use of gated energy functions (i.e., conditioned on the specific visual observation).

Usage

Prerequisites

Datasets

Pretrained Model

Run the Code

  1. Modify exp/GSTEG.py
    • Create the cache directory
    • Specify the location of the data, training/validation split, and pretrained model.
  2. Command as follows
python3 exp/GSTEG.py

Acknowledgement

A large portion of the code comes from the Temporal Fields, VidVRD, and ImageNet repo.