Home

Awesome

Region-based Non-local operation for Video Classification [arXiv]

<div align="center"> <img src="visualization.jpg" width="800px" /> </div>

Citation

Please [★star] this repo and [cite] the following arXiv paper if you think our RNL is useful for you:

@article{huang2020region,
  title={Region-based Non-local Operation for Video Classification},
  author={Huang, Guoxi and Bors, Adrian G},
  journal={arXiv preprint arXiv:2007.09033},
  year={2020}
}

Prerequisites

Data Preparation

Please refer to TSM repo for the details of data preparation.

Pretrained Models

The accuracy might be a bit different from the paper, as we did some modification to our models. For example, instead of using SE module reported in the paper, we use the Channel-gate module form GCNet to model the channel attention.

methodn-frameKinetics Acc.checkpoint
NL I3D-ResNet5032 * 10clips74.9%-
RNL TSM-ResNet508 * 10clips75.6%link
RNL TSM-ResNet5016 * 10clips77.2%link
RNL TSM-ResNet50(16+8) * 10clips77.4%-

On Kinetics, RNL TSM models achieve better performance than NL I3D model with less computation (shorter video length).

methodn-frameSomething-V1 Acc.checkpoint
RNL TSM-ResNet508 * 2clips49.5%link
RNL TSM-ResNet5016 * 2clips51.0%link
RNL TSM-ResNet50(8+16) * 2clips52.7%-
RNL TSM-ResNet1018 * 2clips50.8%link
RNL 101 + RNL 50(8+16) * 2clips54.1%-

Training

We provided several examples to train RNL network with this repo:

python main.py --dataset kinetics  --dense_sample --dist-url 'tcp://localhost:6666' \
--dist-backend 'nccl' --multiprocessing-distributed --available_gpus 0,1,2,3 --world-size 1 \
--rank 0 --gd 20 --shift --shift_div=8 --shift_place=blockres --npb --lr 0.02 --wd 2e-4 \
--dropout 0.5 --num_segments 8 --batch_size 16 --batch_multiplier 4 --use_warmup --warmup_epochs 5 \
--lr_type cos --epochs 100 --non_local  --suffix 1

python main.py --dist-url 'tcp://localhost:6666' --dist-backend 'nccl' \
--multiprocessing-distributed --available_gpus 0,1,2,3 --world-size 1 --rank 0 \
--dataset something --gd 20 --shift --shift_div=8 --shift_place=blockres --npb \
--lr 0.02 --wd 1e-3 --dropout 0.8 --num_segments 8 --batch_size 16 --batch_multiplier 4\
--use_warmup --warmup_epochs 1 --lr_type cos --epochs 50 --non_local  --suffix 1

# Notice that the total batch size is equal to batch_size x batch_multiplier x world_size, and 
# you should scale up the learning rate with batch size. For example, if you use 
# a batch size of 128 you should set learning rate to 0.04.

Test

For example, to test the downloaded pretrained models, you can run the scripts below. The scripts test RNL on 8-frame setting by running:


# test on kinetics
python test_models.py kinetics  \
--weights=pretrained/TSM_kinetics_RGB_resnet50_shift8_blockres_avg_segment8_e100_cos_dense_nl_lr0.02_wd2.0e-04.pth.tar \
--test_segments=8 --batch_size=16 -j 25 --test_crops=3  --dense_sample --full_res

# test on Something
python test_models.py something \
--weights=pretrained/TSM_something_RGB_resnet50_shift8_blockres_avg_segment8_e50_cos_nl_h_8e-4.pth.tar \
--test_segments=8 --batch_size=2 -j 25 --test_crops=3  --twice_sample  --full_res

Other Info

References

This repository is built upon the following baseline implementations.

Contact

For any questions, please feel free to open an issue or contact:

Guoxi Huang: gh825@york.ac.uk