Home

Awesome

This code is very old and it does not work anymore with the current version of TensorFlow

For TensorFlow, there is a version in the TensorFlow Addons.

For PyTorch, you can find COCOB together with other parameter-free algorithms in a library I mantain.

COCOB

TensorFlow implementation of COCOB from the paper

Backprop without Learning Rates Through Coin Betting
Francesco Orabona and Tatiana Tommasi
https://arxiv.org/abs/1705.07795

Description

COntinuous COin Betting (COCOB) is a novel algorithm for stochastic subgradient descent (SGD) that does not require any learning rate setting. Contrary to previous methods, we do not adapt the learning rates, nor we make use of the assumed curvature of the objective function. Instead, we reduce the optimization process to a game of betting on a coin and obtain a learning rate free procedure for deep networks.

How do we reduce SGD to coin betting?
Betting on a coin works in this way: start with $1 and bet some money w<sub>t</sub> on the outcome of a coin toss g<sub>t</sub>, that can be +1 or -1. Similarly, in the optimization world, we want to minimize a 1D convex function f(w), and the gradients g<sub>t</sub> that we can receive are only +1 and -1. Thus, we can treat the gradients g<sub>t</sub> as the outcomes of the coin toss, and the value of the parameter w<sub>t</sub> as the money bet in each round.
If we make a lot of money with the betting it means that we are very good at predicting the gradients, and in optimization terms it means that we converge quickly to the minimum of our function. More in details, the average of our bets will converge to the minimizer at a rate that depends on the dual of the growth rate of our money.

Why using this reduction?
Because algorithms to bet on coins are already known, they are optimal, parameter-free, and very intuitive. So, in the reduction we get an optimal parameter-free algorithm, almost for free. Our paper extends a known betting strategy to a data-dependent one.

Is this magical?
No, you could get the same results just running in parallel copies of SGD with different learning rates and then combining them with an algorithm on top. This would lead exactly to the same convergence rate we get with COCOB, but the advantage here is that you just need to run one algorithm!

We refer the interested reader to the paper for many more details.

Code & Usage

Here you find the scripts needed to reproduce the experiment on MNIST data with a fully connected 2-layers network (1000 hidden units each and ReLU activations, mini-batch size of 100) as reported in Figure 2 (top row) of our paper.

To run the code simply cd to the mnist directory and use

python mnist_fully_connected.py

It will create a directory data where the MNIST data are downloaded and saved. In case of any problem with the data download it can also be done manually with

wget https://storage.googleapis.com/cvdf-datasets/mnist/train-images-idx3-ubyte.gz
wget http://storage.googleapis.com/cvdf-datasets/mnist/train-labels-idx1-ubyte.gz
wget http://storage.googleapis.com/cvdf-datasets/mnist/t10k-images-idx3-ubyte.gz
wget http://storage.googleapis.com/cvdf-datasets/mnist/t10k-labels-idx1-ubyte.gz

While running, the code will print on screen the training cost and test error per epoch

epoch 0, training cost 7.38284, test error 0.885 
epoch 1, training cost 0.0914842, test error 0.0413998 
epoch 2, training cost 0.0321911, test error 0.0226998
...
epoch 37, training cost 5.72368e-05, test error 0.0164999 
epoch 38, training cost 5.53296e-05, test error 0.0163999 
epoch 39, training cost 5.34391e-05, test error 0.0164999

History