Awesome

PLATON: Pruning Large Transformer Models with Upper Confidence Bound of Weight Importance

This pytorch package implements PLATON: Pruning Large Transformer Models with Upper Confidence Bound of Weight Importance (ICML 2022).

Setup environment

conda create -n pruning python=3.7
conda activate pruning 
pip install -r requirement.txt

Usage of PLATON

Import Pruner:

from Pruner import Pruner

Initialize the pruner:

PLATON = Pruner(model, args=args, pruner_name="PLATON", total_step=t_total, 
		mask_param_name=['attention.self', 'attention.output.dense', 'output.dense', 'intermediate.dense'])

model: the model to be pruned.
args.initial_threshold: initial remaining ratio $r^{(0)}$.
args.final_threshold: final remaining ratio $r^{(T)}$.
initial warmup steps for pruning is equal to initial_warmup $\times$ warmup_steps.
final warmup steps for pruning is equal to final_warmup $\times$ warmup_steps.
args.beta1: $\beta_1$ for PLATON.
args.beta2: $\beta_2$ for PLATON.
args.deltaT: the length of local average window.
mask_param_name: the list of substrings of names of pruned parameters.

After each step of optimizer.step(), add the following line to update $\overline{I}$ and $\overline{U}$ and prune the model iteratively.

threshold, mask_threshold = PLATON.update_and_pruning(model, global_step)

GLUE benchmark

Check the folder GLUE for more details about reproducing the GLUE results. An example of iterative pruning for BERT-base on MNLI:

python train.py \
--initial_threshold 1. --final_threshold 0.20 \
--warmup_steps 5400 --initial_warmup 1 --final_warmup 5 \
--beta1 0.85 --beta2 0.95 --deltaT 10 \
--data_dir data/canonical_data/bert-base-uncased \
--train_datasets mnli --test_datasets mnli_matched,mnli_mismatched \
--init_checkpoint mt_dnn_models/bert_model_base_uncased.pt \
--batch_size 32 --batch_size_eval 256 \
--optimizer adamax --learning_rate 8e-5 \
--epochs 8 --seed 7 \
--log_per_updates 100 --eval_per_updates 6000 \
--output_dir log/run --log_file log.log --tensorboard

Please see GLUE/scripts for more examples of GLUE.

Question Answering Task

Check the folder SQuAD for more details about reproducing the results of SQuAD. An example of iterative pruning for BERT-base on SQuADv1.1:

python run_squad.py --pruner_name PLATON \
--initial_threshold 1 --final_threshold 0.10 \
--warmup_steps 5400 --initial_warmup 1 --final_warmup 5 \
--beta1 0.85 --beta2 0.950 --deltaT 10 \
--num_train_epochs 10 --seed 9 --learning_rate 3e-5 \
--per_gpu_train_batch_size 16 --per_gpu_eval_batch_size 256 \
--do_train --do_eval --do_lower_case \
--model_type bert --model_name_or_path bert-base-uncased \
--logging_steps 300 --eval_steps 3000 --save_steps 100000 \
--data_dir data/squad \
--output_dir log/deberta-v3-base/PLATON/ --overwrite_output_dir

Citation

@inproceedings{zhang2022platon,
  title={PLATON: Pruning Large Transformer Models with Upper Confidence Bound of Weight Importance},
  author={Zhang, Qingru and Zuo, Simiao and Liang, Chen and Bukharin, Alexander and He, Pengcheng and Chen, Weizhu and Zhao, Tuo},
  booktitle={International Conference on Machine Learning},
  pages={26809--26823},
  year={2022},
  organization={PMLR}
}