Home

Awesome

Speculative Decoding with Big Little Decoder (BiLD)

This repo implements Speculative Decoding with Big Little Decoder (BiLD) on top of the HuggingFace framework.

Check out the paper for more details.

image

What is Big Little Decoder?

Big Little Decoder is a simple framework that enables faster generative inference. It can dramatically accelerate text generation by ~2x, without compromising performance on a variety of text generation scenarios. Furthermore, it is a simple plug-and-play solution that requires no training or architecture redesign.

Here's the key underlying idea:

  1. BiLD offloads the majority of simple word decisions to a smaller model, and only switches the control back to the larger model when needed.
  2. The small model "fallbacks" to the large model, when it runs into a hard-to-predict word.
  3. In case the small model makes a misstep, the larger model can "rollback" the predictions to correct the error
  4. This collaborative text generation combines the small model's fast autoregressive execution with the large model's accurate and efficient non-autoregressive execution!

Running BiLD for Machine Translation

Prerequisite

You need to prepare your own large and small models. You can either use HuggingFace's pretrained models or finetune them on your target tasks. Please refer to the HuggingFace's official instructions for more detail on loading and/or finetuning pretrained models.

Evaluation

We provide a script that evaluates BiLD on machine translation tasks: examples/pytorch/run_bild_translation.py.

BiLD evaluation command:

CUDA_VISIBLE_DEVICES=0 python run_bild_translation.py --model bild --small [small_model_path] --large [large_model_path] \
    --dataset_name iwslt2017 --dataset_config iwslt2017-de-en --source_lang de --target_lang en --bild_rollback [RB] --bild_fallback [FB]

We also provide a command for running the baseline model:

CUDA_VISIBLE_DEVICES=0 python run_bild_translation.py --model [model_path] \
    --dataset_name iwslt2017 --dataset_config iwslt2017-de-en --source_lang de --target_lang en 

Pretrained Checkpoints

We provide finetuned checkpoints that were used for the evaluations in our paper.

DatasetModelLink
IWSLT-2017-De-EnmT5-smalllink
IWSLT-2017-De-EnmT5-small (aligned)link
IWSLT-2017-De-EnmT5-largelink
WMT-2014-De-EnmT5-smalllink
WMT-2014-De-EnmT5-small (aligned)link
WMT-2014-De-EnmT5-largelink
XSUMT5-smalllink
XSUMT5-small (aligned)link
XSUMT5-largelink
CNNDMT5-smalllink
CNNDMT5-small (aligned)link
CNNDMT5-largelink