Home

Awesome

The inception-resnet-v2 models trained from scratch via torch

For personal interests, I and my friend Sunghun Kang (shuni@kaist.ac.kr) trained inception-resnet-v2 (http://arxiv.org/abs/1602.07261) from scratch based on torch, esp. facebook's training scripts (https://github.com/facebook/fb.resnet.torch)

I uploaded the torch model definition and the training script I used as PR in here.

Because of limited computational resources we have, we tried only few training conditions. For someone who are interested in achieving the same performance in the paper, I added some notes we've learned throughout trials. Those might be helpful.

Requirements

  1. See, https://github.com/facebook/fb.resnet.torch/pull/64

Settings

  1. SGD w/ momentum = 0.4737, batchsize = 32 x 2 = 64, and learning rate scheduling with step-style where stepsize = 12800 and gamma = 0.96
  1. SGD w/ momentum = 0.9, batchsize = 32 x 2 = 64, and learning rate scheduling with step-style where stepsize = 25600 and gamma = 0.96
  1. SGD w/ momentum = 0.9, batchsize = 32 x 2 = 64, and learning rate scheduling with step-style where stepsize = 51200 and gamma = 0.96

Results

  1. 1-crop validation error on ImageNet (center 299x299 crop from resized image with 328x328):

  2. Single-crop (299x299) validation error rate

| Network               | Top-1 error | Top-5 error |
| --------------------- | ----------- | ----------- |
| Setting 1             | 24.407      | 7.407       |
| Setting 2             | N/A         | N/A         |
| Setting 3             | N/A         | N/A         |

2. Training curves on ImageNet (solid lines: 1-crop top-1 error; dashed lines: 1-crop top-5 error): * You can plot yourself based on the scripts in the tools/plot_log.py

![Training curves](https://github.com/lim0606/torch-inception-resnet-v2/blob/master/figures/b64_s12800_i1801710.png)

Notes

  1. There seems typos in the paper (http://arxiv.org/abs/1602.07261)

  2. In Figure 17. the number of features in the last 1x1 convolution layer for residual path (= 1154) does not fit to the one of the output of the reduction A layer (= 1152); therefore, I changed the number of features in the last 1x1 conv to 1152.

  3. In Figure 18. the number of features in 3x3 conv in the the second 1x1 conv -> 3x3 conv path (= 288) and the ones of 3x3 convs in the last path (1x1 conv -> 3x3 conv -> 3x3 conv) (= 288, 320) do not fit to the number of features in the following Inception-ResNet-C layer (= 2048); therefore, I changed them based on the model in https://gist.github.com/revilokeb/ab1809954f69d6d707be0c301947b69e

  4. As mentioned in the inception-v4 paper (section 3.3.), scaling down by multiplying a scalar (0.1 or somewhat equivalent) to the last neuronal activities in residual path (the activities of linear conv of residual path) for residual layer seems very important to avoid explosion.

  1. I used the custom learning rate scheduling since 1) the batch size information wasn't provided in the inception v4 paper and 2) the custom lr scheduling has been worked well for differnt types of imagenet classifier models.
  1. The momentum for SGD was set to 0.4737, which is the equivalent value on torch style sgd for the 0.9 momentum on caffe style sgd (see, https://github.com/KaimingHe/deep-residual-networks).
  1. Based on the comparison between the loss curve I got and the one in the paper, the effective batchsize in the original paper seems like 32.
  1. While my PR included create-imagenet-lmdb.lua for lmdb usage, I didn't train models with lmdb. I just used the provided imagenet dataset code. See, https://github.com/lim0606/torch-inception-resnet-v2/issues/1.

Models

  1. For setting 1, see Google Drive

  2. For setting 2, training is in progress

  3. For setting 3, training is in progress

References

  1. http://arxiv.org/abs/1602.07261
  2. https://github.com/facebook/fb.resnet.torch
  3. https://github.com/revilokeb/inception_resnetv2_caffe
  4. https://www.reddit.com/r/MachineLearning/comments/47asuj/160207261_inceptionv4_inceptionresnet_and_the/?
  5. https://github.com/beniz/deepdetect/issues/89