Awesome

Rectified Linear Attention

This repo contain pytorch implementation of Sparse Attention with Linear Units, this is not the official repo so some details might be vary from paper.

Citation:

@misc{zhang2021sparse,
      title={Sparse Attention with Linear Units}, 
      author={Biao Zhang and Ivan Titov and Rico Sennrich},
      year={2021},
      eprint={2104.07012},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

References:

Transformer component and initial Attention code from lucidrain's vit-pytorch
RMSNorm code is from this repo.