Home

Awesome

nnlr

Add layer-wise learning rate schemes to Torch. At the moment, it works with nn and nngraph modules. At the moment, the only supported optimization algorithm supported is optim SGD implementation.

Usage

nnlr adds the following methods to nn.Module:

module:learningRate('weight', 0.1)
module:learningRate('bias', 0.2)
module:weightDecay('weight', 1)
module:weightDecay('bias', 0)

The learningRate and weightDecay methods set the module's relative learning rate and weight decay, respectivly. I.e., if the learning rate for the network is 0.05, then the weight learning rate of module will be 0.005, and the bias learning rate 0.01.

All of these methods are optional. If the relative learning rate or weight decay is not set for a module, it will default to 1. Additionally, each method returns the original module, allowing for chaining.

Rather than suppling a scalar learning rate and weight decay to the optimization function, supply the following vectors:

local learningRates, weightDecays = module:getOptimConfig(baseLearningRate, baseWeightDecay)

The SGD config table should then be of the form:

{
  learningRates = learningRates,
  weightDecays = weightDecays,
  learningRate = baseLearningRate,
  -- ...
}

Note that the config table uses the keys learningRates and weightDecays (plural).

(The API is inspired by the nninit package. These two packages should work well in conjunction.)

Installation

luarocks install nnlr

Example

require 'nn'
require 'optim'
require 'nnlr'

-- Network
local net = nn.Sequential()

-------
-- This layer is locked down. No learning happens
-------
-- Conv 1
net:add(nn.SpatialConvolution(1, 32, 5, 5, 1, 1, 2, 2)
  :learningRate('weight', 0)
  :learningRate('bias', 0)
  :weightDecay('weight', 0)
  :weightDecay('bias', 0)
)
net:add(nn.SpatialBatchNormalization(32))
net:add(nn.ReLU())
net:add(nn.SpatialMaxPooling(2, 2, 2, 2))

-------
-- This layer has a lower learning rate than all the
-- other layers.
-------
-- Conv 2
net:add(nn.SpatialConvolution(32, 48, 5, 5, 1, 1, 1, 1)
  :learningRate('weight', 0.1)
  :learningRate('bias', 0.2)
  -- we don't supply a weightDecay value for 'weight' --- rather we
  -- choose to use the default value
  :weightDecay('bias', 0)
)
net:add(nn.SpatialBatchNormalization(48))
net:add(nn.ReLU())
net:add(nn.SpatialMaxPooling(2, 2, 2, 2))
net:add(nn.View(-1):setNumInputDims(3))

-------
-- The following layers use the default learning rate
-- and weight decay. No learningRate or weightDecay
-- call necessary.
-------
-- Full 3
net:add(nn.Linear(2352, 100))
net:add(nn.BatchNormalization(100))
net:add(nn.ReLU())
-- Full 4
net:add(nn.Linear(100, 100))
net:add(nn.BatchNormalization(100))
net:add(nn.ReLU())
-- Full 5
net:add(nn.Linear(100, 10))
net:add(nn.LogSoftMax())

-------
-- Here we get the learningRates and weightDecays
-- vectors required for optimization
-------
local baseLearningRate = 0.1
local baseWeightDecay = 0.0001
local learningRates, weightDecays = net:getOptimConfig(baseLearningRate, baseWeightDecay)

-------
-- Train the network...
-------

local weight, grad = net:getParameters()

-- ... some training loop ...
  local feval = function()
    return loss, grad
  end

  -------
  -- We use the learningRates and weightDecays vectors here
  -- in place of scalar values
  -------
  optim.sgd(feval, weight, {
    learningRates = learningRates,
    weightDecays = weightDecays,
    learningRate = baseLearningRate,
    momentum = 0.9,
  })

-- ...