Home

Awesome

<div align="center"> <h1>Scalable Efficient Training of Large Language Models with Low-dimensional Projected Attention</h1> </div>

🎉 This is the implementation of EMNLP 2024 main conference paper:Scalable Efficient Training of Large Language Models with Low-dimensional Projected Attention

Preparation

Prepare Data

In the paper/code, we use the WikiText-103 datasets and the Pile datasets, which are all open access on the Internet.

Modify Path

Before running the code, please replace the following data or work path definition with your path:

LPA

You can apply LPA by running the following codes:

cd scripts
# model setting 1
bash lpa_train_setting1.sh
# model setting 2
bash lpa_train_setting2.sh

We explain some of the arguments as follows:

The argument low_attn / low_ffn can decide whether or not to apply low-dimensional module in the attention / FFN layer. For LPA model, we set low_attn to 'yes' and low_ffn to 'no'. For original Transformer (our main baseline in the paper), we set low_attn and low_ffn to 'yes'.

statement

Part of the code in file LPA/architecture/lpa_setting2.py refers to the implementation of LLaMA model in Huggingface Transformer.

Bugs or questions?

If you have any questions related to the codes or the paper, please contact Xingtai Lv (lvxt24@mails.tsinghua.edu.cn) or open an issue.

Citation

If you find our work useful, please use the following citation:

@misc{lv2024scalableefficienttraininglarge,
      title={Scalable Efficient Training of Large Language Models with Low-dimensional Projected Attention}, 
      author={Xingtai Lv and Ning Ding and Kaiyan Zhang and Ermo Hua and Ganqu Cui and Bowen Zhou},
      year={2024},
      eprint={2411.02063},
      archivePrefix={arXiv},
      primaryClass={cs.CL}, 
}