Awesome
Multi-Candidate Speculative Decoding
Code Release
See here.
Data Release
For Alpaca dataset, we use exactly the same exact source as SpecInfer.
For the WMT dataset, we follow the process of SpecInfer: randomly sampling 1000 samples from the test set. We wrap the source sentences using the following template:
Translate the input English sentence into German.
Input: {source sentence}
Output:
Model Release
We release our fine-tuned draft models on hugginface, see Vicuna-68M and Vicuna-160M. They are fine-tuned from LLaMA-68M and LLaMA-160M respectively on ShareGPT data. The training setup follows FastChat.