Home

Awesome

Selfish Sparse RNN Training

This repository is the official implementation of ICML2021: Selfish Sparse RNN Training

Requirements

Our code is highly based on the awesome sparse training library of Sparse Momentum.

The library requires PyTorch v1.0.1 and CUDA v9.0.

You can download it via anaconda or pip, see PyTorch/get-started for further information.

Training

We provide the training codes of Selfish stacked-LSTM and Selfish RHN.

To train Selfish stacked-LSTM on PTB dataset with GPU in the paper, run this command:

python main.py --sparse --sparse_init uniform --optimizer sgd --model LSTM --cuda --growth random --death magnitude --redistribution none --nonmono 5 --batch_size 20 --bptt 35 --lr 40 --clip 0.25 --seed 1111 --emsize 1500 --nhid 1500 --nlayers 2 --death-rate 0.8 --dropout 0.65 --density 0.33 --epochs 100

To train Selfish RHN on PTB dataset with GPU in the paper, run this command:

python main.py --sparse --sparse_init uniform --optimizer sgd --model RHN --cuda --tied --couple --seed 42 --nlayers 1 --growth random --death magnitude --redistribution none --density 0.472 --death-rate 0.5 --clip 0.25 --lr 15 --epochs 500 --dropout 0.65 --dropouth 0.25 --dropouti 0.65 --dropoute 0.2 --emsize 830 --nhid 830

To train Selfish ONLSTM on PTB dataset with GPU in the paper, run this two commands:

cd ONLSTM
python main_ONLSTM.py --sparse --sparse_init uniform --optimizer sgd --growth random --death magnitude --redistribution none --density 0.45 --death-rate 0.5 --batch_size 20 --dropout 0.45 --dropouth 0.3 --dropouti 0.5 --nonmono 5 --wdrop 0.45 --chunk_size 10 --seed 141 --epoch 1000

Options:

Evaluation

To evaluate the pre-trained Selfish stacked-LSTM model on PTB, run:

python main.py --sparse --evaluate model_path --optimizer sgd --model LSTM --cuda --growth random --death magnitude --redistribution none --nonmono 5 --batch_size 20 --bptt 35 --lr 40 --clip 0.25 --seed 5 --emsize 1500 --nhid 1500 --nlayers 2 --death-rate 0.7 --dropout 0.65 --density 0.33 --epochs 100

To evaluate the pre-trained model, you need to replace the model_path with your model path and all the training hyper-parameters keep the same as the training command.

Pre-trained Models

You can download the pretrained Selfish stacked-LSTM models here:

This model gives 71.65 test perplexity on PTB dataset at sparsity of 0.67. To evaluate this pre-trained model, you need to run:

python main.py --sparse --evaluate model_path --optimizer sgd --model LSTM --cuda --growth random --death magnitude --redistribution none --nonmono 5 --batch_size 20 --bptt 35 --lr 40 --clip 0.25 --seed 5 --emsize 1500 --nhid 1500 --nlayers 2 --death-rate 0.7 --dropout 0.65 --density 0.33 --epochs 100

"model_path" is the path where you save this model.

Results

Our model achieves the following performance on :

[Selfish stacked-LSTM, RHN and ONLSTM on PTB dataset:]

Model nameSparsityValidation perplexityTest perplexity
Selfish stacked-LSTM0.6773.7971.65
Selfish RHN0.5362.1060.35
Selfish ONLSTM_10000.5558.17+-0.0656.31+-0.10
Selfish ONLSTM_13000.5557.67+-0.0355.82+-0.11

[Selfish AWD-LSTM-MoS on Wikitext-2 dataset:]

Model nameSparsityValidation perplexityTest perplexity
Selfish AWD-LSTM-MoS without finetuning0.5565.9663.05

[Apply Selfish-RNN to your own architectures]

Apply Selfish-RNN to train other models is simple, you just need three steps:

(1) creating masks with

decay = CosineDecay(args.death_rate, args.epochs * len(train_data) // args.bptt) mask = Masking(optimizer, death_rate=args.death_rate, death_mode=args.death, death_rate_decay=decay, growth_mode=args.growth, redistribution_mode=args.redistribution, model=args.model) mask.add_module(model, sparse_init=args.sparse_init, density=args.density)

(2) change optimizer.step() to mask.step() in the training function.