Home

Awesome

YellowFin Build Status

YellowFin is an auto-tuning optimizer based on momentum SGD which requires no manual specification of learning rate and momentum. It measures the objective landscape on-the-fly and tunes momentum as well as learning rate using local quadratic approximation.

The implementation here can be a drop-in replacement for any optimizer in PyTorch. It supports step and zero_grad functions like any PyTorch optimizer after from yellowfin import YFOptimizer. We also provide interface to manually set the learning rate schedule at every iteration for finer control (see Detailed Guideline Section).

For more technical details, please refer to our paper YellowFin and the Art of Momentum Tuning.

For more usage details, please refer to the inline documentation of tuner_utils/yellowfin.py. Example usage can be found here for ResNext on CIFAR10 and Tied LSTM on PTB.

YellowFin is under active development. Many members of the community have kindly submitted issues and pull requests. We are incorporating fixes and smoothing things out. As a result the repository code is in flux. Please make sure you use the latest version and submit any issues you might have!

Updates

[2017.07.03] Fixed a gradient clipping bug. Please pull our latest master branch to make gradient clipping great again in YellowFin.

[2017.07.28] Switched to logrithmic smoothing to accelerate adaptation to curvature range trends.

[2017.08.01] Added optional feature to enforce non-increasing value of lr * gradient norm for stablity in some rare cases.

[2017.08.05] Added feature to correct estimation bias from sparse gradient.

[2017.08.16] Replace numpy root solver with closed form solution using Vieta's substitution for cubic eqaution. It solves the stability issue of the numpy root solver.

[2017.10.29] Major fixe for stability. We added eps to protect fractions in our code, as well as an adaptive clipping feature to properly deal with exploding gradient (manual clipping is still supported as described in the detailed instruction below).

Setup instructions for experiments

Please clone the master branch and follow the instructions to run YellowFin on ResNext for CIFAR10 and tied LSTM on Penn Treebank for language modeling. The models are adapted from ResNext repo and PyTorch example tied LSTM repo respectively. Thanks to the researchers for developing the models. For more experiments on more convolutional and recurrent neural networks, please refer to our Tensorflow implementation of YellowFin.

Note YellowFin is tested with PyTorch v0.2.0 for compatibility. It is tested under Python 2.7.

Run CIFAR10 ResNext experiments

The experiments on 110 layer ResNet with CIFAR10 and 164 layer ResNet with CIFAR100 can be launched using

cd pytorch-cifar
python main.py --logdir=path_to_logs --opt_method=YF

Run Penn Treebank tied LSTM experiments

The experiments on multiple-layer LSTM on Penn Treebank can be launched using

cd word_language_model
python main.py --emsize 650 --nhid 650 --dropout 0.5 --epochs 40 --tied --opt_method=YF --logdir=path_to_logs --cuda

For more experiments, please refer to our YellowFin Tensorflow Repo.

Detailed guidelines

<!---* **Sparsity**: Gradient norm, curvature estimations etc., when calculated with sparse gradient, are biased to larger values than the counterpart from the dense gradient on the full dataset. The bias can be illustrated using the following example: the norm of vectors (1.0, 0.0), (0.0, 1.0) and the norm of their average (0.5, 0.5). The norm of the latter is sqrt(sparsity (i.e. 0.5 here) ) * the norm of the former. The sparsity debias feature is useful when the model is very sparse, e.g. LSTM with word embedding. For non-sparse models, e.g. CNN, turning this feature off could slightly speedup.---> <!--## Additional experiments to test the repo We use the [ResNext on CIFAR10](https://github.com/JianGoForIt/YellowFin_Pytorch/blob/master/pytorch-cifar/main.py#L91) and [Tied LSTM on PTB](https://github.com/JianGoForIt/YellowFin_Pytorch/blob/master/word_language_model/main.py#L191) to test the PyTorch implementation here. For more on experimental results, please refer to our [paper](https://arxiv.org/abs/1706.03471).--> <!--![ResNext](plots/resnext_test_acc.png)--> <!--![Tied LSTM](plots/tied_ptb_test_perp.png)-->

Citation

If you use YellowFin in your paper, please cite the paper:

@article{zhang2017yellowfin,
  title={YellowFin and the Art of Momentum Tuning},
  author={Zhang, Jian and Mitliagkas, Ioannis and R{\'e}, Christopher},
  journal={arXiv preprint arXiv:1706.03471},
  year={2017}
}

Acknowledgement

We thank Olexa Bilaniuk, Andrew Drozdov, Paroma Varma, Bryan He, as well as github user @elPistolero @esvhd for the help in contributing to and testing the codebase.

Implementation for other platforms

For Tensorflow users, we implemented YellowFin Tensorflow Repo.

<!---For MXNet users, Github user [StargazerZhu](https://github.com/StargazerZhu) has already implemented a Theano version here: [YellowFin MXNet Repo](https://github.com/StargazerZhu/YellowFin_MXNet).---> <!---For Theano users, Github user [botev](https://github.com/botev) has already implemented a Theano version here: [YellowFin Theano Repo](https://gist.github.com/botev/f8b32c00eafee222e47393f7f0747666).--->

We thank the contributors for YellowFin in different deep learning frameworks.