Home

Awesome

ReImplementation of Residual Attention Network for Image Classification

This is a Gluon implementation of the residual attention network in the paper 1704.06904.

<img src="data/figure2.png"/>

Requirement

python3.5+, mxnet-1.2.1+, MXBoard

Inspiration

The code is inspired by the gluon resnet implementation and https://github.com/liudaizong/Residual-Attention-Network.

Mixup data augmentation use GluonCV for reference.

Train

GPU is preferred.

Cifar10

To view the training process, tensorboard is required.

tensorboard --logdir=./log/board/cifar10_201808311834 --host=0.0.0.0 --port=8888
ResultsAccuracyLossTest AccuracyTest Accuracy(using mixup)Official Report
Attention56<img src="data/cifar10-attention56-accuracy.png"/><img src="data/cifar10-attention56-loss.png"/>0.94990.95810.9448
Attention92<img src="data/cifar10-attention92-accuracy.png"/><img src="data/cifar10-attention92-loss.png"/>0.95240.96080.9501

The author does not give the architecture of cifar10-AttentionNet, I follow the implementation of https://github.com/tengshaofeng/ResidualAttentionNetwork-pytorch.
In previous version, the feature map is down sampled to 16x16 before stage1, it can only achieve about 0.93 on test set. Following Teng's implementation, the feature map size is still 32x32, and it gets an accuracy improvement of 2%.

Tricks
I followed the tricks in 1812.01187, the pipeline is training res-att-net56 on 8 1080ti GPUs with nvidia-dali and BytePS, which allow to finish in 1 hour for cifar10.

The result below can be reproduced by running scripts/train_bps.sh.

Layersbatch_sizelrwarmupmix_upalphaepsilonmax_accuracy
5610242.05False0.2-0.956631
5610241.55False0.2-0.955929
5610241.05False0.2-0.954527
5610242.05True0.2-0.959135
5610242.05True0.4-0.962240
5610242.05True1.0-0.962941
5610242.05True0.20.10.961839
5610242.05True0.20.010.961038
5610242.010True0.20.10.959635
5610242.010True1.00.10.963742
5610241.610True1.0-0.965044
5610242.010True0.20.010.959235
565122.05False0.2-0.955729
565121.55False0.2-0.956831
565121.05False0.2-0.955812
565122.010True0.20.10.958534
565122.010True1.00.10.957833
565122.010True0.20.010.955829
565121.05True0.4-0.962182
565121.05True1.0-0.964670
565121.05True1.00.10.963642
565121.05True1.00.010.963674
565121.010True1.00.10.962139
565120.810True1.00.10.964271
565120.410True1.00.10.963442

Note:

ImageNet

Emmmm....

TODO

python3 cifar10_train.py --num-layers 92 --num-gpus 1 --workers 2 --batch-size 64 --epochs 200 --lr-steps 80,120 --mix-up 1 --alpha 1.0

It can be easily applied to other tasks.

attention_net_spec = {56: ([1, 2, 1], [1, 1, 1]),
                      92: ([1, 2, 1], [1, 2, 3]),
                      128: ([1, 2, 1], [3, 3, 3]),
                      164: ([1, 2, 1], [4, 4, 4]),
                      236: ([1, 2, 1], [6, 6, 6]),
                      452: ([2, 4, 3], [6, 6, 6])}

References

  1. Residual Attention Network for Image Classification 1704.06904
  2. MXNet Documentation and Tutorials zh.gluon.ai/
  3. GluonCV Classification gluon-cv.mxnet.io/model_zoo/classification.html