Home

Awesome

GloRe

Implementation for: Graph-Based Global Reasoning Networks (CVPR19)

Software

Train & Evaluate

Train kinetics (single node):

./run_local.sh

Train kinetics (multiple nodes):

# please setup ./Host before running
./run_dist.sh

Evaluate the trained model on kinetics:

cd test
# check $ROOT/test/*.txt for the testing log
python test-single-clip.py

Note:

Results

Image Recognition (ImageNet-1k)

ModelMethodRes3Res4Code & ModelTop-1
ResNet50Baselinelink76.2 %
ResNet50w/ GloRe+3link78.4 %
ResNet50w/ GloRe+2+3link78.2 %
SE-ResNet50Baselinelink77.2 %
SE-ResNet50w/ GloRe+3link78.7 %
ModelMethodRes3Res4Code & ModelTop-1
ResNet200w/ GloRe+3link79.4 %
ResNet200w/ GloRe+2+3link79.7 %
ResNeXt101 (32x4d)w/ GloRe+2+3link79.8 %
DPN-98w/ GloRe+2+3link80.2 %
DPN-131w/ GloRe+2+3link80.3 %

* We use pre-activation[1] and strided convolution[2] for all networks for simplicity and consistency.

Video Recognition (Kinetics-400)

Modelinput framesstrideRes3Res4ModelClip Top-1
Res50 (3D) + Ours88+2+3link68.0 %
Res101 (3D) + Ours88+2+3link69.2 %

* ImageNet-1k pretrained models: R50(link), R101(link).

Semantic Segmentation (Cityscapes)

MethodBackboneCode & ModelIoU cla.iIoU cla.IoU cat.iIoU cat.
FCN + 1 GloRe unitResNet50link79.5%60.3%91.3%81.5%
FCN + 1 GloRe unitResNet101link80.9%62.2%91.5%82.1%

* All networks are evaluated on Cityscapes test set by the testing server without using extra “coarse” training set.

Other Resources

ImageNet-1k Training/Validation List:

ImageNet-1k category name mapping table:

Kinetics Dataset:

Cityscapes Dataset:

FAQ

Where can I find the code for image classification and segmentation?

Do I need to convert the raw videos to specific format?

How can I make the training faster?

For example:

# convet to sort_edge_length <= 288
ffmpeg -y -i ${SRC_VID} -c:v mpeg4 -filter:v "scale=min(iw\,(288*iw)/min(iw\,ih)):-1" -b:v 640k -an ${DST_VID}
# or, convet to sort_edge_length <= 256
ffmpeg -y -i ${SRC_VID} -c:v mpeg4 -filter:v "scale=min(iw\,(256*iw)/min(iw\,ih)):-1" -b:v 512k -an ${DST_VID}
# or, convet to sort_edge_length <= 160
ffmpeg -y -i ${SRC_VID} -c:v mpeg4 -filter:v "scale=min(iw\,(160*iw)/min(iw\,ih)):-1" -b:v 240k -an ${DST_VID}

Reference

[1] He, Kaiming, et al. "Identity mappings in deep residual networks."
[2] https://github.com/facebook/fb.resnet.torch

Citation

@inproceedings{chen2019graph,
  title={Graph-based global reasoning networks},
  author={Chen, Yunpeng and Rohrbach, Marcus and Yan, Zhicheng and Shuicheng, Yan and Feng, Jiashi and Kalantidis, Yannis},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  pages={433--442},
  year={2019}
}

License

The code and the models are MIT licensed, as found in the LICENSE file.