Awesome

Self-Tuning Networks

This repository contains the code used for the paper Self-Tuning Networks: Bilevel Optimization of Hyperparameters using Structured Best-Response Functions (ICLR 2019).

Requirements

Python 3.6.x
Pytorch 0.4.1

Setup

The following is an example of how to create an environment with the appropriate versions of the dependencies:

conda create -n stn-env python=3.6
source activate stn-env
conda install pytorch=0.4.1 cuda80 -c pytorch
conda install torchvision -c pytorch
pip install -r requirements.txt

Experiments

CNN Experiments

The CNN code in this reposistory is built on the Cutout codebase. These commands should be run from inside the cnn folder.

To train a Self-Tuning CNN:

python hypertrain.py --tune_all --tune_scales --entropy_weight=1e-3 --save

To train a baseline CNN:

python train_basic.py

LSTM Experiments

The LSTM code in this repository is built on the AWD-LSTM codebase. The commands for the LSTM experiments should be run from inside the lstm folder.

First, download the PTB dataset:

./getdata.sh

Schedule Experiments

The commands in this section can be used to obtain results for Table 1 in the paper.

Using a fixed value for output dropout discovered by grid search
```
python train_basic.py --dropouto=0.68 --prefix=dropouto
```

Gaussian-perturbed output dropout rate, with std=0.05

python train_basic.py --dropouto=0.68 --prefix=dropouto_gauss --perturb_type=gaussian --perturb_std=0.05 --perturb_dropouto

Sinusoid-perturbed output dropout rate, with amplitude=0.1 and period=1200 minibatches

python train_basic.py --dropouto=0.68 --prefix=dropouto_sin --perturb_type=sinusoid --amplitude=0.1 --sinusoid_period=1200 --perturb_dropouto

STN-tuned output dropout

python train.py --dropouto=0.05 --tune_dropouto --save_dir=dropouto_stn

Train from scratch following the STN schedule for output dropout (replace the path in --load_schedule with the one generated by the STN command above):
```
python train_basic.py --load_schedule=logs/dropouto_stn/2019-06-15/epoch.csv
```
Train from scratch with the final output dropout value from STN training:
```
python train_basic.py --dropouto=0.78 --prefix=dropouto_final
```

Schedules with Different Hyperparameter Initializations

The following commands find STN schedules starting with different initial dropout values (in {0.05, 0.3, 0.5, 0.7, 0.9})

python train.py --dropouto=0.05 --tune_dropouto --save_dir=dropouto_schedule_init05
python train.py --dropouto=0.3 --tune_dropouto --save_dir=dropouto_schedule_init30
python train.py --dropouto=0.5 --tune_dropouto --save_dir=dropouto_schedule_init50
python train.py --dropouto=0.7 --tune_dropouto --save_dir=dropouto_schedule_init70
python train.py --dropouto=0.9 --tune_dropouto --save_dir=dropouto_schedule_init90

To plot the schedules, first modify the variables log_dir_init05, log_dir_init30, log_dir_init50, log_dir_init70, log_dir_init90 in save_dropouto_schedule_plot.py to point to the appropriate directories created by the commands above, and then run:
```
python save_dropouto_schedule_plot.py
```

Tuning Multiple LSTM Hyperparameters

Run the following command to tune the input/hidden/output/embedding dropout, weight DropConnect, and the coefficients of activation regularization (alpha) and temporal activation regularization (beta):

python train.py --seed=3 --tune_all --save_dir=st-lstm

Project Structure

.
├── README.md
├── cnn
│   ├── datasets
│   │   ├── __init__.py
│   │   ├── cifar.py
│   │   └── loaders.py
│   ├── hypermodels
│   │   ├── __init__.py
│   │   ├── alexnet.py
│   │   ├── hyperconv2d.py
│   │   ├── hyperlinear.py
│   │   └── small.py
│   ├── hypertrain.py
│   ├── logger.py
│   ├── models
│   │   ├── __init__.py
│   │   ├── alexnet.py
│   │   └── small.py
│   ├── train_basic.py
│   └── util
│       ├── __init__.py
│       ├── cutout.py
│       ├── dropout.py
│       └── hyperparameter.py
├── lstm
│   ├── data.py
│   ├── embed_regularize.py
│   ├── getdata.sh
│   ├── hyperlstm.py
│   ├── locked_dropout.py
│   ├── logger.py
│   ├── model_basic.py
│   ├── save_dropouto_schedule_plot.py
│   ├── train.py
│   ├── train_basic.py
│   ├── utils.py
│   └── weight_drop.py
├── requirements.txt
└── stn_utils
    ├── __init__.py
    └── hyperparameter.py

7 directories, 34 files

Code Contributors

Matthew MacKay
Paul Vicol
Jon Lorraine

Citation

If you use this code, please cite:

Matthew MacKay, Paul Vicol, Jonathan Lorraine, David Duvenaud and Roger Grosse. Self-Tuning Networks: Bilevel Optimization of Hyperparameters using Structured Best-Response Functions. International Conference on Learning Representations (ICLR), 2019.

@inproceedings{STN2019,
  title={Self-Tuning Networks: Bilevel Optimization of Hyperparameters using Structured Best-Response Functions},
  author={Matthew MacKay and Paul Vicol and Jonathan Lorraine and David Duvenaud and Roger Grosse},
  booktitle={{International Conference on Learning Representations (ICLR)}},
  year={2019}
}