Awesome

Remark: The code is updated from the ICML version. The ICML version corresponds to a commit on May 25, 2018.

ResFGB

This is a Theano(>=1.0.0) implementation of "Functional gradient boosting based on residual network perception".

ResFGB is a functional gradient boosting method for learning a resnet-like deep neural network for non-linear classification problems. The model is composed of a linear classifier such as logistic regression and support vector machine, and a feature extraction. In each iteration, these components are trained by alternate optimization, that is, a linear classifier is trained to classify obtained samples through a feature extraction and this extraction map is updated by stacking a resnet-type layer to move samples along the direction of increasing the linear separability. We finally obtain a highly non-linear classifier forming a residual network.

Usage

A simple pseudocode is provided below.

Note: (X,Y): training data, (Xv,Yv): validation data, (Xt,Yt): test data. These are numpy arrays. n_data: the number of training data, input_dim: dimension of the input space, n_class: the number of classes. A label set should be an integer sequence starting with zero.

from resfgb.models import ResFGB, get_hyperparams

hparams = get_hyperparams( n_data, input_dim, n_class )
model = ResFGB( **hparams )
best_iters,_ ,_ = model.fit( X, Y, Xv, Yv, use_best_iter=True )

train_loss, train_acc = model.evaluate( X, Y )
print( 'train_loss: {0}, train_acc: {1}'.format(train_loss, train_acc) )

test_loss, test_acc  = model.evaluate( Xt,  Yt )
print( 'test_loss : {0}, test_acc : {1}'.format(test_loss, test_acc) )

See examples/sample_resfgb.py for more detail.

Hyperparameters

Hyperparameters of ResFGB are mainly divided three types: the first is for learning a linear classifier, the second is for learning a multi-layer network as a resblock, and the other is for the functional gradient method.

The hyperparameters are listed below. 'Default' is a value set by the function resfgb.models.get_hyperparams. input_dim and n_class stand for the dimension of the input space and the number of classes, respectively.

For the linear model

shape[default=(input_dim, n_class)]
- Shape of the linear model, which should not be changed.
wr[default=1/n_data]
- L2-regularization parameter.
bias[default=True]
- Flag for whether to include bias term or not.
eta[default=1e-2]
- Learning rate for Nesterov's momentum method.
momentum[default=0.9]
- Momentum parameter for Nesterov's momentum method.
minibatch_size[default=100]
- Minibatch size to compute stochastic gradients.
max_epoch[default=100]
- The number of epochs for learning a linear model.
tune_eta[default=True]
- Flag for whether to tune learning rate or not.
scale[default=1.0]
- Positive number by which a tuned learning rate is multiplied.
eval_iters[default=1000]
- The number of iterations in a trial for tuning learning rate.
early_stop[default=10]
- When the training loss does not improve while this number of epochs, the training is stopped.

For the resblock

shape[default=(input_dim,100,100,100,100,input_dim)]
- Shape of the multi-layer perceptron. Dimensions of the input and last layer should set to input_dim.
wr[default=1/n_data]
- L2-regularization parameter.
eta[default=1e-2]
- Learning rate for Nesterov's momentum method.
momentum[default=0.9]
- Momentum parameter for Nesterov's momentum method.
minibatch_size[default=100]
- Minibatch size to compute stochastic gradients.
max_epoch[default=50]
- The number of epochs for learning a linear model.
tune_eta[default=True]
- Flag for whether to tune learning rate or not.
scale[default=1.0]
- Positive number by which a tuned learning rate is multiplied.
eval_iters[default=1000]
- The number of iterations in a trial for tuning learning rate.
early_stop[default=10]
- When the training loss does not improve while this number of epochs, the training is stopped.

For the functional gradient method

model_type[default='logistic']
- Type of the linear model: 'logistic' or 'smooth_hinge'.
model_hparams[default=model_hparams]
- Dictionary of the hyperparameter for the linear model.
resblock_hparams[default=resblock_hparams]
- Dictionary of the hyperparameter for the resblock.
fg_eta[default=1e-1]
- Learning rate used in the functional gradient method.
max_iters[default=30
- The number of iterations of the functional gradient method, which corresponds to the depth of an obtained network.
seed[default=1]
- Random seed used in the method.