Awesome
Stochastic Gradient Descent with Hyperbolic-Tangent Decay on Classification
This repository contains the code for HTD introduced in the following paper:
Stochastic Gradient Descent with Hyperbolic-Tangent Decay on Classification (Accepted to WACV 2019)
Contents
Introduction
Learning rate scheduler has been a critical issue in the deep neural network training. Several schedulers and methods have been proposed, including step decay scheduler, adaptive method, cosine scheduler and cyclical scheduler. This paper proposes a new scheduling method, named hyperbolic-tangent decay (HTD). We run experiments on several benchmarks such as: ResNet, Wide ResNet and DenseNet for CIFAR-10 and CIFAR-100 datasets, LSTM for PAMAP2 dataset, ResNet on ImageNet and Fashion-MNIST datasets. In our experiments, HTD outperforms step decay and cosine scheduler in nearly all cases, while requiring less hyperparameters than step decay, and more flexible than cosine scheduler.
Usage
- (If you want to train CIFAR datasets only) Install tensorflow and keras.
- (If you want to train ImageNet) Install Torch and required dependencies like cuDNN. See the instructions here for a step-by-step guide.
- Clone this repo:
https://github.com/BIGBALLON/HTD.git
├─ our_Net % Our CIFAR dataset training code
├─ fb.resnet.torch % [facebook/fb.resnet.torch]
└─ DenseNet % [liuzhuang13/DenseNet]
See the following examples. To run the training with ResNet, on CIFAR-10,
using step decay scheduler, simply run:
python train.py --batch_size 128 \
--epochs 200 \
--data_set cifar10 \
--learning_rate_method step_decay \
--network resnet \
--log_path ./logs \
--network_depth 5
using other learning rate scheduler(cos or tanh), by changing --learning_rate_method
flag:
python train.py --batch_size 128 \
--epochs 200 \
--data_set cifar10 \
--learning_rate_method tanh \
--network resnet \
--log_path ./logs \
--network_depth 5 \
--tanh_begin -4.0 \
--tanh_end 4.0
Results on CIFAR
The table below shows the results of HTD on CIFAR datasets. Best results are written in blue.
The character * indicates results are directly obtained from the original paper.
Results on ImageNet
The Torch models are trained under the same setting as in fb.resnet.torch. Best results are written in blue.
The character * indicates results are directly obtained from the original paper.
Contact
fm.bigballon at gmail.com
byshiue at gmail.com
If you use our code, please consider citing the technical report as follows:
@inproceedings{hsueh2019stochastic,
title={Stochastic Gradient Descent with Hyperbolic-Tangent Decay on Classification},
author={Hsueh, Bo-Yang and Li, Wei and Wu, I-Chen},
booktitle={2019 IEEE Winter Conference on Applications of Computer Vision (WACV)},
pages={435--442},
year={2019},
organization={IEEE}
}
Please feel free to contact us if you have any discussions, suggestions or questions!!