Home

Awesome

Group Contextualization for Video Recognition (CVPR 2022)

This is an official implementaion of paper "Group Contextualization for Video Recognition", which has been accepted by CVPR 2022. Paper link

<div align="center"> <img src="demo/model.jpg" width="700px"/> </div>

Updates

March 11, 2022

Content

Prerequisites

The code is built with following libraries:

For video data pre-processing, you may need ffmpeg.

Data Preparation

For GC-TSN, GC-GST, GC-TSM, we need to first extract videos into frames for all datasets (Kinetics-400, Something-Something V1 and V2, Diving48 and EGTEA Gaze+), following the TSN repo. While for GC-TDN, the data process follows the backbone TDN work, which resizes the short edge of video to 320px and directly decodes video mp4 file during training/evaluation.

Code

GC-TSN/TSM/GST/TDN codes are based on TSN, TSM, GST and TDN codebases, respectively.

Pretrained Models

Here we provide some of the pretrained models.

Kinetics-400

ModelFrame * view * clipTop-1 Acc.Top-5 Acc.Checkpoint
GC-TSN ResNet508 * 1 * 1075.2%92.1%link
GC-TSM ResNet508 * 1 * 1075.4%91.9%link
GC-TSM ResNet5016 * 1 * 1076.7%92.9%link
GC-TSM ResNet5016 * 3 * 1077.1%92.9%
GC-TDN ResNet508 * 3 * 1077.3%93.2%link
GC-TDN ResNet5016 * 3 * 1078.8%93.8%link
GC-TDN ResNet50(8+16) * 3 * 1079.6%94.1%

Something-Something

Something-Something V1&V2 datasets are highly temporal-related. Here, we use the 224×224 resolution for performance report.

Something-Something-V1

ModelFrame * view * clipTop-1 Acc.Top-5 Acc.Checkpoint
GC-GST ResNet508 * 1 * 248.8%78.5%link
GC-GST ResNet5016 * 1 * 250.4%79.4%link
GC-GST ResNet50(8+16) * 1 * 252.5%81.3%
GC-TSN ResNet508 * 1 * 249.7%78.2%link
GC-TSN ResNet5016 * 1 * 251.3%80.0%link
GC-TSN ResNet50(8+16) * 1 * 253.7%81.8%
GC-TSM ResNet508 * 1 * 251.1%79.4%link
GC-TSM ResNet5016 * 1 * 253.1%81.2%link
GC-TSM ResNet50(8+16) * 1 * 255.0%82.6%
GC-TSM ResNet50(8+16) * 3 * 255.3%82.7%
GC-TDN ResNet508 * 1 * 153.7%82.2%link
GC-TDN ResNet5016 * 1 * 155.0%82.3%link
GC-TDN ResNet50(8+16) * 1 * 156.4%84.0%

Something-Something-V2

ModelFrame * view * clipTop-1 Acc.Top-5 Acc.Checkpoint
GC-GST ResNet508 * 1 * 261.9%87.8%link
GC-GST ResNet5016 * 1 * 263.3%88.5%link
GC-GST ResNet50(8+16) * 1 * 265.0%89.5%
GC-TSN ResNet508 * 1 * 262.4%87.9%link
GC-TSN ResNet5016 * 1 * 264.8%89.4%link
GC-TSN ResNet50(8+16) * 1 * 266.3%90.3%
GC-TSM ResNet508 * 1 * 263.0%88.4%link
GC-TSM ResNet5016 * 1 * 264.9%89.7%link
GC-TSM ResNet50(8+16) * 1 * 266.7%90.6%
GC-TSM ResNet50(8+16) * 3 * 267.5%90.9%
GC-TDN ResNet508 * 1 * 164.9%89.7%link
GC-TDN ResNet5016 * 1 * 165.9%90.0%link
GC-TDN ResNet50(8+16) * 1 * 167.8%91.2%

Diving48

ModelFrame * view * clipTop-1 Acc.Checkpoint
GC-GST ResNet5016 * 1 * 182.5%link
GC-TSN ResNet5016 * 1 * 186.8%link
GC-TSM ResNet5016 * 1 * 187.2%link
GC-TDN ResNet5016 * 1 * 187.6%link

EGTEA Gaze

ModelFrame * view * clipSplit1Split2Split3
GC-GST ResNet508 * 1 * 165.5%61.6%60.6%
GC-TSN ResNet508 * 1 * 166.4%64.6%61.4%
GC-TSM ResNet508 * 1 * 166.5%66.1%62.6%
GC-TDN ResNet508 * 1 * 165.0%61.8%61.0%

Train

For different backbones, please use their corresponding training code, like 'train_tsn.sh' with the usage of TSN.

Test

For TSN/TSM/GST backbones, please use the test py "test_models_tsntsmgst_gc.py", run 'sh bash_test_tsntsmgst_gc.sh'. Please change the "from ops_tsntsmgst.models_tsn import VideoNet" (line-19 in test_models_tsntsmgst_gc.py) with the corresponding model name.

For TDN backbone, please use its official test file, see https://github.com/MCG-NJU/TDN.

Contributors

GC codes are jointly written and owned by Dr. Yanbin Hao and Dr. Hao Zhang.

Citing

@article{gc2022,
  title={Group Contextualization for Video Recognition},
  author={Yanbin Hao, Hao Zhang, Chong-Wah Ngo, Xiangnan He},
  journal={CVPR 2022},
}

Acknowledgement

Thanks for the following Github projects: