Awesome
Rectified Linear Attention
This repo contain pytorch implementation of Sparse Attention with Linear Units, this is not the official repo so some details might be vary from paper.
Citation:
@misc{zhang2021sparse,
title={Sparse Attention with Linear Units},
author={Biao Zhang and Ivan Titov and Rico Sennrich},
year={2021},
eprint={2104.07012},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
References:
- Transformer component and initial Attention code from lucidrain's vit-pytorch
- RMSNorm code is from this repo.