Home

Awesome

<!-- markdownlint-disable first-line-h1 --> <!-- markdownlint-disable html --> <div align="center"> <h1> TransNormerLLM -- A Faster and Better LLM </h1> </div> <p align="center"> πŸ€— <a href="https://huggingface.co/OpenNLPLab/" target="_blank">Hugging Face</a> β€’ πŸ€– <a href="https://modelscope.cn/models/OpenNLPLab/TransNormerLLM-7B" target="_blank">Model Scope</a> β€’ πŸ’¬ <a href="https://discord.gg/A8UrpM6A4" target="_blank">Discord</a> β€’ πŸ’¬ <a href="./images/contact_me_qr.png" target="_blank">WeChat</a> β€’ πŸ”’ <a href="https://github.com/LaaZa/AutoGPTQ/tree/TransNormer" target="_blank">GPTQ</a> </p> <div align="center">

license

<h4 align="center"> <p> <b>English</b> | <a href="https://github.com/OpenNLPLab/TransNormerLLM/blob/main/README_CN.md">δΈ­ζ–‡</a> <p> </h4> </div>

Introduction

We are re-inventing the Large Language Model (LLM). This is the official implementation of TransNormerLLM. Our opened weights of TransNormerLLM are now accessible to individuals, creators, researchers and businesses of all sizes so that they can experiment, innovate and scale their ideas responsibly.

Our release contains the TransNormerLLM model implementation, the open-source weights and the starting code for Supervised Fine-tuning (SFT). We will show examples on how to load TransNormerLLM models, run SFT and inference on it.

Released Weights

The specific released versions and download links are shown as below:

Base Models
385MπŸ€— TransNormerLLM-385M
1BπŸ€— TransNormerLLM-1B
7BπŸ€— TransNormerLLM-7B

Benchmark Results

To validate TransNormerLLM, we tested our 385M, 1B, and 7B models on Commonsense Reasoning Task, MMLU, CMMLU, and C-Eval. For comparison, we selected several open-source models as competitors, including Transformer-based models such as OPT, Pythia, BLOOM, GPT-Neo, GPT-J, MPT, Falcon, LLaMA1/2, OpenLLAMA v1/v2, Baichuan 1/2, ChatGLM 1/2, and non-Transformer model RWKV. It can be observed that, compared to these models, TransNormerLLM remains highly competitive.

Commonsense Reasoning We report BoolQ, PIQA, SIQA, HellaSwag, WinoGrande, ARC easy and challenge, OpenBookQA and their average. We report 0-shot results for all benchmarks using LM-Eval-Harness. All of our models achieve competitive performance compared to existing state-of-the-art LLMs, showcasing a remarkable ability to comprehend and apply commonsense reasoning.

Aggregated Benchmarks We report the overall results for MMLU, CMMLU, C-Eval. Official scripts were used for evaluating MMLU, CMMLU, and C-Eval, with all evaluation results being conducted with a 5-shot setup. In comparison to top-tier open-source models available in the industry, our models have demonstrated matched performance in both English and Chinese benchmarks.

General Domain

In the general domain, we conducted 5-shot tests on the following datasets:

Model Results

Performance Comparison on Commonsense Reasoning and Aggregated Benchmarks. For a fair comparison, we report competing methods' results reproduced by us using their released models. PS: parameter size (billion). T: tokens (trillion). HS: HellaSwag. WG: WinoGrande.

ModelPSTBoolQPIQAHSWGARC-eARC-cOBQAMMLUCMMLUC-Eval
OPT0.350.3057.7464.5836.6952.4944.0223.8928.2026.0225.3425.71
Pythia0.400.3060.4067.0840.5253.5951.8124.1529.4025.9925.1624.81
BLOOM0.560.3555.1464.0936.9752.8047.3523.9828.2024.8025.3527.14
RWKV0.43--67.5240.9051.1452.8625.1732.4024.85--
Ours0.391.062.1466.7046.2754.4655.4327.9932.4025.9025.0525.24
GPT-Neo1.30.361.9971.1148.9354.9356.1925.8533.6024.8226.0323.94
OPT1.30.357.7771.7153.7059.3557.2429.6933.2024.9624.9725.32
Pythia1.40.360.7370.6747.1853.5156.9926.8831.4026.5525.1324.25
BLOOM1.10.3559.0867.1442.9854.9351.4725.6829.4027.3025.0926.50
RWKV1.5--72.3652.4854.6260.4829.4434.0025.77--
Falcon1.00.3561.3875.1461.5060.3063.3832.1735.6025.2824.8825.66
Ours1.01.263.2772.0956.4960.3863.6835.2436.6027.1025.8826.01
GPT-J6.90.365.4475.4166.2564.0966.9236.6038.2025.4026.4723.39
OPT6.70.366.1876.2267.2165.1965.6634.6437.2024.5725.3625.32
Pythia6.90.363.4675.1463.9260.7767.3435.4137.0024.6425.5626.40
BLOOM7.10.3562.9172.6962.3364.0165.1133.4535.8026.2524.9724.25
RWKV7.4--76.0665.5161.0167.8037.4640.2024.96--
MPT6.91.073.8879.4376.2568.2774.7941.7242.2030.8025.9924.06
Falcon7.21.573.7379.3876.367.1774.6243.6043.8027.7925.7322.92
Baichuan17.01.270.0976.0170.0664.0971.7240.5338.2042.3044.4342.80
Baichuan27.02.672.7276.5072.1768.3575.1742.3239.6054.1657.0754.00
ChatGLM16.71.074.7468.8845.5752.2548.7831.6636.8040.6337.4840.23
ChatGLM27.11.477.6569.3750.5157.6259.1334.3037.0045.4648.8052.55
OpenLLaMAv16.71.070.4375.6869.2366.6971.1738.5739.0030.4925.4026.09
OpenLLaMAv26.71.072.2078.8474.5165.6772.3941.3041.0041.2929.5830.01
LLaMA16.71.076.5079.8076.1070.1072.8047.6057.2035.1025.6225.72
LLaMA26.72.077.6878.0776.0268.9876.3046.3344.2045.3032.9633.20
Ours6.81.475.8780.0975.2166.0675.4244.4063.4043.1047.9943.18

Inference and Deployment

The model weights, source code, and configuration needed for inference have been released on Hugging Face. Download links can be found in the table. Below, we demonstrate various inference methods using TransNormerLLM-1B as an example. The program will automatically download the required resources from Hugging Face.

Dependency Installation

pip install -r requirements.txt

Notice

If you encounter errors related to Triton, please set the following environment variables:

export use_triton=False

Python Code Inference

Demonstration of Base Model Inference

>>> from transformers import AutoModelForCausalLM, AutoTokenizer
>>> tokenizer = AutoTokenizer.from_pretrained("OpenNLPLab/TransNormerLLM-1B", trust_remote_code=True)

In the above code snippets, the model loading specifies device_map='auto', which will use all available GPUs. If you need to specify the device(s) to use, you can control it in a way similar to export CUDA_VISIBLE_DEVICES=0,1 (using the 0 and 1 graphics cards).

Fine-tuning the Model

Dependency Installation

git clone https://github.com/OpenNLPLab/TransNormerLLM.git
cd TransNormerLLM/fine-tune
pip install -r requirements.txt

Training

Below, we provide an example of fine-tuning the TransNormerLLM-1B on a single machine with ZeRO-3.

Training Data: alpaca_data.json. This sample data was drawn from alpaca_data.json, consisting of a selection of 52,002 entries, and has been reformatted. The main purpose is to demonstrate how to SFT our model, and effectiveness is not guaranteed.

torchrun \
    --nproc_per_node=8 \
    train.py \
    --model_name_or_path OpenNLPLab/TransNormerLLM-1B \
    --data_path ./alpaca_data.json \
    --output_dir output \
    --num_train_epochs 1 \
    --per_device_train_batch_size 2 \
    --per_device_eval_batch_size 1 \
    --gradient_accumulation_steps 1 \
    --bf16 true \
    --adam_beta1 0.9 \
    --adam_beta2 0.95 \
    --evaluation_strategy "no" \
    --save_strategy "steps" \
    --save_steps 5000 \
    --save_total_limit 30 \
    --learning_rate 1e-4 \
    --weight_decay 0.1 \
    --warmup_ratio 0.1 \
    --lr_scheduler_type "cosine" \
    --deepspeed 'configs/zero3.json' \
    --logging_steps 1 \
    --dataloader_num_workers 24 \
    --ddp_find_unused_parameters false \
    --tf32 true \

Community and Ecosystem

πŸ“’πŸ“’πŸ“’ We will continuously update the support for TransNormerLLM from the community and ecosystem here πŸ˜€πŸ˜€πŸ˜€

Disclaimer, License and Citation

Disclaimer

We hereby declare that our team has not developed any applications based on TransNormerLLM models, not on iOS, Android, the web, or any other platform. We strongly call on all users not to use TransNormerLLM models for any activities that harm national / social security or violate the law. Also, we ask users not to use TransNormerLLM models for Internet services that have not undergone appropriate security reviews and filings. We hope that all users can abide by this principle and ensure that the development of technology proceeds in a regulated and legal environment.

We have done our best to ensure the compliance of the data used in the model training process. However, despite our considerable efforts, there may still be some unforeseeable issues due to the complexity of the model and data. Therefore, if any problems arise due to the use of TransNormerLLM open-source models, including but not limited to data security issues, public opinion risks, or any risks and problems brought about by the model being misled, abused, spread or improperly exploited, we will not assume any responsibility.

License

The community usage of TransNormerLLM model requires adherence to Apache 2.0 and Community License for TransNormerLLM Model. The TransNormerLLM model supports commercial use. If you plan to use the TransNormerLLM model or its derivatives for commercial purposes, please ensure that your entity meets the following conditions:

  1. The Daily Active Users (DAU) of your or your affiliate's service or product is less than 1 million.
  2. Neither you nor your affiliates are software service providers or cloud service providers.
  3. There is no possibility for you or your affiliates to grant the commercial license given to you, to reauthorize it to other third parties without TransNormerLLM's permission.

Upon meeting the above conditions, you need to submit the application materials required by the TransNormerLLM Model Community License Agreement via the following contact email: opennlplab@gmail.com. Once approved, TransNormerLLM will hereby grant you a non-exclusive, global, non-transferable, non-sublicensable, revocable commercial copyright license.

Acknowledgments

Our project is developed based on the following open source projects:

Citation

If you wish to cite our work, please use the following reference:

@misc{qin2024transnormerllm,
      title={TransNormerLLM: A Faster and Better Large Language Model with Improved TransNormer},
      author={Zhen Qin and Dong Li and Weigao Sun and Weixuan Sun and Xuyang Shen and Xiaodong Han and Yunshen Wei and Baohong Lv and Xiao Luo and Yu Qiao and Yiran Zhong},
      year={2024},
      eprint={2307.14995},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

@misc{qin2024lightning,
      title={Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models},
      author={Zhen Qin and Weigao Sun and Dong Li and Xuyang Shen and Weixuan Sun and Yiran Zhong},
      year={2024},
      eprint={2401.04658},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}