Awesome
Self-Tuning Networks
This repository contains the code used for the paper Self-Tuning Networks: Bilevel Optimization of Hyperparameters using Structured Best-Response Functions (ICLR 2019).
Requirements
- Python 3.6.x
- Pytorch 0.4.1
Setup
The following is an example of how to create an environment with the appropriate versions of the dependencies:
conda create -n stn-env python=3.6
source activate stn-env
conda install pytorch=0.4.1 cuda80 -c pytorch
conda install torchvision -c pytorch
pip install -r requirements.txt
Experiments
CNN Experiments
The CNN code in this reposistory is built on the Cutout codebase.
These commands should be run from inside the cnn
folder.
To train a Self-Tuning CNN:
python hypertrain.py --tune_all --tune_scales --entropy_weight=1e-3 --save
To train a baseline CNN:
python train_basic.py
LSTM Experiments
The LSTM code in this repository is built on the AWD-LSTM codebase.
The commands for the LSTM experiments should be run from inside the lstm
folder.
First, download the PTB dataset:
./getdata.sh
Schedule Experiments
The commands in this section can be used to obtain results for Table 1 in the paper.
-
Using a fixed value for output dropout discovered by grid search
python train_basic.py --dropouto=0.68 --prefix=dropouto
-
Gaussian-perturbed output dropout rate, with std=0.05
python train_basic.py --dropouto=0.68 --prefix=dropouto_gauss --perturb_type=gaussian --perturb_std=0.05 --perturb_dropouto
-
Sinusoid-perturbed output dropout rate, with amplitude=0.1 and period=1200 minibatches
python train_basic.py --dropouto=0.68 --prefix=dropouto_sin --perturb_type=sinusoid --amplitude=0.1 --sinusoid_period=1200 --perturb_dropouto
-
STN-tuned output dropout
python train.py --dropouto=0.05 --tune_dropouto --save_dir=dropouto_stn
-
Train from scratch following the STN schedule for output dropout (replace the path in
--load_schedule
with the one generated by the STN command above):python train_basic.py --load_schedule=logs/dropouto_stn/2019-06-15/epoch.csv
-
Train from scratch with the final output dropout value from STN training:
python train_basic.py --dropouto=0.78 --prefix=dropouto_final
Schedules with Different Hyperparameter Initializations
-
The following commands find STN schedules starting with different initial dropout values (in {0.05, 0.3, 0.5, 0.7, 0.9})
python train.py --dropouto=0.05 --tune_dropouto --save_dir=dropouto_schedule_init05 python train.py --dropouto=0.3 --tune_dropouto --save_dir=dropouto_schedule_init30 python train.py --dropouto=0.5 --tune_dropouto --save_dir=dropouto_schedule_init50 python train.py --dropouto=0.7 --tune_dropouto --save_dir=dropouto_schedule_init70 python train.py --dropouto=0.9 --tune_dropouto --save_dir=dropouto_schedule_init90
-
To plot the schedules, first modify the variables
log_dir_init05
,log_dir_init30
,log_dir_init50
,log_dir_init70
,log_dir_init90
insave_dropouto_schedule_plot.py
to point to the appropriate directories created by the commands above, and then run:python save_dropouto_schedule_plot.py
Tuning Multiple LSTM Hyperparameters
Run the following command to tune the input/hidden/output/embedding dropout, weight DropConnect, and the coefficients of activation regularization (alpha) and temporal activation regularization (beta):
python train.py --seed=3 --tune_all --save_dir=st-lstm
Project Structure
.
├── README.md
├── cnn
│ ├── datasets
│ │ ├── __init__.py
│ │ ├── cifar.py
│ │ └── loaders.py
│ ├── hypermodels
│ │ ├── __init__.py
│ │ ├── alexnet.py
│ │ ├── hyperconv2d.py
│ │ ├── hyperlinear.py
│ │ └── small.py
│ ├── hypertrain.py
│ ├── logger.py
│ ├── models
│ │ ├── __init__.py
│ │ ├── alexnet.py
│ │ └── small.py
│ ├── train_basic.py
│ └── util
│ ├── __init__.py
│ ├── cutout.py
│ ├── dropout.py
│ └── hyperparameter.py
├── lstm
│ ├── data.py
│ ├── embed_regularize.py
│ ├── getdata.sh
│ ├── hyperlstm.py
│ ├── locked_dropout.py
│ ├── logger.py
│ ├── model_basic.py
│ ├── save_dropouto_schedule_plot.py
│ ├── train.py
│ ├── train_basic.py
│ ├── utils.py
│ └── weight_drop.py
├── requirements.txt
└── stn_utils
├── __init__.py
└── hyperparameter.py
7 directories, 34 files
Code Contributors
- Matthew MacKay
- Paul Vicol
- Jon Lorraine
Citation
If you use this code, please cite:
- Matthew MacKay, Paul Vicol, Jonathan Lorraine, David Duvenaud and Roger Grosse. Self-Tuning Networks: Bilevel Optimization of Hyperparameters using Structured Best-Response Functions. International Conference on Learning Representations (ICLR), 2019.
@inproceedings{STN2019,
title={Self-Tuning Networks: Bilevel Optimization of Hyperparameters using Structured Best-Response Functions},
author={Matthew MacKay and Paul Vicol and Jonathan Lorraine and David Duvenaud and Roger Grosse},
booktitle={{International Conference on Learning Representations (ICLR)}},
year={2019}
}