Awesome
Improving training of deep neural networks via Singular Value Bounding
This is the code release for the Singular Value Bounding (SVB) and Bounded Batch Normalization (BBN) methods proposed in the CVPR2017 paper "Improving training of deep neural networks via Singular Value Bounding", authored by Kui Jia, Dacheng Tao, Shenghua Gao, and Xiangmin Xu.
This work investigates solution properties of neural networks that can potentially lead to good performance. Inspired by orthogonal weight initialization, we propose to constrain the solutions of weight matrices in the orthogonal feasible set during the whole process of network training.
We achieve this by a simple yet effective method called Singular Value Bounding (SVB). In SVB, all singular values of each weight matrix are simply bounded in a narrow band around the value of 1. Based on the same motivation, we also propose Bounded Batch Normalization (BBN), which improves Batch Normalization by removing its potential risk of ill-conditioned layer transform.
We present both theoretical and empirical results to justify our proposed methods. In particular, we achieve the state-of-the-art results of 3.06% error rate on CIFAR10 and 16.90% on CIFAR100, using off-the-shelf network architectures (Wide ResNets).
Project page: http://www.aperture-lab.net/research/svb/
Results
Controlled studies on CIFAR10 using 20-layer (left) and 38-layer (right) ConvNets (VGG)
Validation curves on CIFAR10 using two ConvNets of 20 and 38 weight layers respectively. Blue lines are results by SGD with momentum. Red lines are results by SVB at different values of \epsilon (0.01, 0.05, 0.2, 0.5, 1) in Algorithm 1 of the paper. Black lines are results using both SVB (fixing \epsilon = 0.05) and BBN at different values of \tilde{\epsilon} (0.01, 0.05, 0.2, 0.5, 1)) in Algorithm 2 of the paper. The left two figures are from the 20-layer ConvNet, and the right two ones are from the 38-layer ConvNet.
Ablation studies on CIFAR10 using a 68-layer ResNet
Training methods | Error rate (%) |
---|---|
SGD with momentum + BN | 6.10 (6.22 +/- 0.14) |
SVB + BN | 5.65 (5.79 +/- 0.10) |
SVB + BBN | 5.37 (5.49 +/- 0.11) |
Ablation studies on CIFAR10, using a pre-activation ResNet with 68 weight layers of 3 x 3 convolutional filters. Results are in the format of best (mean + std) over 5 runs. Standard data augmentation (4 pixels zero-padding plus horizontal flipping) is used.
Results on CIFAR10 and CIFAR100 using Wide ResNets
Methods | CIFAR10 | CIFAR100 | # layers | # params |
---|---|---|---|---|
Wide ResNet W/O SVB+BBN | 3.78 | 19.92 | 28 | 36.5M |
Wide ResNet WITH SVB+BBN | 3.24 | 17.47 | 28 | 36.5M |
Wider ResNet W/O SVB+BBN | 3.64 | 19.25 | 28 | 94.2M |
Wider ResNet WITH SVB+BBN | 3.06 | 16.90 | 28 | 94.2M |
Wide ResNet and Wider ResNet in the table above respectively refer to the architectures of “WRN-28-10” and “WRN-28-16” as in Wide Residual Networks. Standard data augmentation (4 pixels zero-padding plus horizontal flipping) is used.
Preliminary results on ImageNet
Training methods | Top-1 error (%) | Top-1 error (%) |
---|---|---|
Our Inception-ResNet | 21.61 | 5.91 |
Our Inception-ResNet WITH SVB+BN | 21.20 | 5.57 |
Results of single-model (Inception-ResNet) and single-crop testing on the ImageNet validation set.
Usage
Installation
The code depends on the Torch library. Please install torch first.
Support of datasets
CIFAR10
CIFAR100 coming soon
ImageNet coming soon
One may refer to fb.resnet.torch package for how to obtain/pre-process these datasets. Which datasets to use is specified in the file optsArgParse.lua
.
Support of network architectures
ResNet (pre-activation)
Wide ResNets
Inception-ResNet coming soon
DenseNet coming soon
ResNeXt coming soon
Training
We take a pre-activation version of ResNet as the example to explain how to train a deep network using SVB and BBN methods.
Run the following at command line when SVB and BBN are not used (i.e., training is based on standard SGD with momentum)
th main.lua -cudnnSetting deterministic -netType PreActResNet -ensembleID 1 -BN true -nBaseRecur 11 -kWRN 1 -lrDecayMethod exp -lrBase 0.5 -lrEnd 0.001 -batchSize 128 -nEpoch 160 -nLRDecayStage 80 -weightDecay 0.0001 -svBFlag false -bnsBFlag false
Run the following at command line when SVB is turned on
th main.lua -cudnnSetting deterministic -netType PreActResNet -ensembleID 1 -BN true -nBaseRecur 11 -kWRN 1 -lrDecayMethod exp -lrBase 0.5 -lrEnd 0.001 -batchSize 128 -nEpoch 160 -nLRDecayStage 80 -weightDecay 0.0001 -svBFlag true -svBFactor 1.5 -svBIter 391 -bnsBFlag false
Run the following at command line when both SVB and BBN are turned on
th main.lua -cudnnSetting deterministic -netType PreActResNet -ensembleID 1 -BN true -nBaseRecur 11 -kWRN 1 -lrDecayMethod exp -lrBase 0.5 -lrEnd 0.001 -batchSize 128 -nEpoch 160 -nLRDecayStage 80 -weightDecay 0.0001 -svBFlag true -svBFactor 1.5 -svBIter 391 -bnsBFlag true -bnsBFactor 2 -bnsBType BBN
Setting svbFactor
as svbFactor = 1 + \epsilon (in Algorithm 1)
. Setting bnsBFactor
as bnsBFactor = 1 + \tilde{\epsilon} (in Algorithm 2)
. Setting kWRN > 1
makes the network architectures become Wide ResNets. Please refer to the file optsArgParse.lua
for setting of other hyperparameters.
One may also set bnsBType
as rel
to get even better performance.
Use of SVB and BBN in your own code
Implementation of SVB and BNN methods is in the file cnnTrain.lua
via functions cnnTrain:fcConvWeightReguViaSVB()
and cnnTrain:BNScalingRegu()
respectively. One may refer to cnnTrain.lua
and main.lua
for use of these two functions.
Contact
kuijia At scut.edu.cn