Awesome

Fast and Robust Early-Exiting (EMNLP 2023)

Fast and Robust Early-Exiting Framework for Autoregressive Language Models with Synchronized Parallel Decoding
Sangmin Bae$^*$, Jongwoo Ko$^*$, Hwanjun Song$^\dagger$, Se-Young Yun$^\dagger$ * equal contribution $\dagger$ corresponding author

Early-Exiting dynamically allocates computation paths based on the complexity of generation for each token.
Conventional framework failed to show actual speedup due to the large number of exit points and state copying mechanism.
We propose FREE, consists of (1) shallow-deep module, (2) synchronized parallel decoding, and (3) adaptive threshold estimator.
In contrast to conventional approaches, FREE achieved larger inference speedup on extensive generation tasks.

🚀 Updates

Implement CALM and FREE on decoder-only models
(24.02.08) Release finetuned checkpoints
(24.01.26) Won 🥈Silver award from Samsung Humantech Paper Awards

Requirements

Install the necessary packages with:

$ pip install -r requirements.txt

Experiments

We experimented with 4 summarization tasks, 1 question answering task, and 1 machine translation task.
Please see the scripts and run shell files to train or evaluate on each dataset.

$ bash run_[TASK_NAME]_[DATASET_NAME].sh

Methods

You can run three early-exiting methods, including Static-Exiting, CALM, and our FREE method.

Here are some important arguments to be considered.
Please refer additional_args for more details.

Training for FREE:

--ouput_hidden_states_decoder True: return hidden_states from intermediate layers
--intermediate_loss_fn shallowdeep_kd_dyna: use a dynamic distillation loss between shallow and deep models
--shallow_exit_layer [int]: set the number of layers for the shallow model
--distill_layer_alpha [float]: distillation interpolation hyperparameter between CE and KL divergence losses

Training for CALM and Static-Exiting:

--ouput_hidden_states_decoder True: return hidden_states from intermediate layers
--intermediate_loss_fn weighted_ce: use a weighted average loss across all layers

Evaluation for FREE:

--deploy_scenario True: this should be always True to use deploying_[MODEL_NAME].py for FREE or CALM
--use_shallow_deep True: use shallow-deep module
--shallow_exit_layer [int]: set the number of layers for the shallow model
--shallow2deep_conf_type softmax: set the confidence measure to softmax values
--shallow2deep_conf_threshold [float]: threshold value to decide whether to exit or not in the shallow model
--use_adapt_threshold True: use adaptive threshold estimator, where the initial threshold is set to shallow2deep_conf_threshold

Evaluation for CALM:

--deploy_scenario True: this should be always True to use deploying_[MODEL_NAME].py for FREE or CALM
--use_early_exit True: use conventional early-exiting framework
--exit_conf_type softmax: set the confidence measure to softmax values
--exit_conf_threshold [float]: threshold value to decide whether to exit or not
--exit_min_layer [int]: the minimum number of layers to forward to decide the exiting

Evaluation for Static-Exiting:

--static_exit_layer [int]: set how many layers to use for prediction

Results

FREE demonstrated robust performance and a larger AUC across various datasets and models, specifically with T5-large and T5-3B.

Human-like Summarization Evaluation

We conducted two human-like evaluation methods, Likert scale scoring and pairwise comparison (refer to this paper).
After correctly making input files through ipynb file, run bash gpt_eval.sh with your own OpenAI API_KEY.
Then, you can get the results by running the last cell in ipynb file.

Checkpoints

We share finetuned checkpoints in google drive.
Note that you must download tokenizer.json for each model individually from HuggingFace to run it without errors. (refer to Issue #3)

BibTeX

If you find this repo useful for your research, please consider citing our paper:

@inproceedings{DBLP:conf/emnlp/BaeKSY23,
  author       = {Sangmin Bae and
                  Jongwoo Ko and
                  Hwanjun Song and
                  Se{-}Young Yun},
  editor       = {Houda Bouamor and
                  Juan Pino and
                  Kalika Bali},
  title        = {Fast and Robust Early-Exiting Framework for Autoregressive Language
                  Models with Synchronized Parallel Decoding},
  booktitle    = {Proceedings of the 2023 Conference on Empirical Methods in Natural
                  Language Processing, {EMNLP} 2023, Singapore, December 6-10, 2023},
  pages        = {5910--5924},
  publisher    = {Association for Computational Linguistics},
  year         = {2023},
  url          = {https://doi.org/10.18653/v1/2023.emnlp-main.362},
  doi          = {10.18653/V1/2023.EMNLP-MAIN.362},
  timestamp    = {Fri, 12 Apr 2024 13:11:38 +0200},
  biburl       = {https://dblp.org/rec/conf/emnlp/BaeKSY23.bib},
  bibsource    = {dblp computer science bibliography, https://dblp.org}
}

Contact

Sangmin Bae: bsmn0223@kaist.ac.kr
Jongwoo Ko: jongwoo.ko@kaist.ac.kr