Awesome
GliDe with a CaPE ICML 2024
Official code for GLIDE with a CAPE: A Low-Hassle Method to Accelerate Speculative Decoding.
Currently, the codebase is a little bit ugly, and I will try to re-built it.
TODO
- Triton-based Tree Attention
- Copy-based Tree KV Cache
- Training with clean codebase, remove ugly deepspeed.