Home

Awesome

bert4torch

licence GitHub release PyPI PyPI - Downloads GitHub stars GitHub Issues contributions welcome Generic badge

Documentation | Torch4keras | Examples | build_MiniLLM_from_scratch | bert4vector

目录

1. 下载安装

安装稳定版

pip install bert4torch

安装最新版

pip install git+https://github.com/Tongjilibo/bert4torch

2. 功能

功能bert4torchtransformers备注
训练进度条进度条打印loss和定义的metrics
分布式训练dp/ddptorch自带dp/ddp
各类callbacks日志/tensorboard/earlystop/wandb等
大模型推理,stream/batch输出各个模型是通用的,无需单独维护脚本
大模型微调lora依赖peft库,pv2自带
丰富tricks对抗训练等tricks即插即用
代码简洁易懂,自定义空间大代码复用度高, keras代码训练风格
仓库的维护能力/影响力/使用量/兼容性目前仓库个人维护
一键部署大模型

3. 快速上手

3.1 上手教程

3.2 命令行快速部署大模型服务

4. 版本和更新历史

4.1 版本历史

更新日期bert4torchtorch4keras版本说明
202408140.5.30.2.6【新功能】增加llama3.1/Yi1.5;自动选择从hfmirror下载;支持命令行参数bert4torch-llm-server
202408010.5.20.2.5【新功能】chatglm/qwen系列支持function call调用, 增加internlm2系列;【小优化】简化pipeline中chat demo的调用,generate的终止token元素允许为列表, 统一rope_scaling参数名,增加rope衍生类;【bug】修复flash_attn2的推理bug, 修复bart的tie_word_embedding的bug
202406190.5.10.2.4增加Qwen1.5, Qwen2, glm4; 增加SWA/convert_lm_logits_dtype;调整各个trainer(重点DPOTrainer), generation中segment_ids, repetition_penalty需带query, RMSNorm中转类型bug

更多版本

4.2 更新历史

更多历史

5. 预训练权重

模型分类模型名称权重来源权重链接/checkpoint_pathconfig_path
bertbert-base-chinesegoogle-bertbert-base-chinesebert-base-chinese
chinese_L-12_H-768_A-12谷歌tf权重<br>Tongjilibo/bert-chinese_L-12_H-768_A-12
chinese-bert-wwm-extHFLhfl/chinese-bert-wwm-extchinese-bert-wwm-ext
bert-base-multilingual-casedgoogle-bertbert-base-multilingual-casedbert-base-multilingual-cased
MacBERTHFLhfl/chinese-macbert-base<br>hfl/chinese-macbert-largechinese-macbert-base<br>chinese-macbert-large
WoBERT追一科技junnyu/wobert_chinese_basejunnyu/wobert_chinese_plus_basewobert_chinese_base<br>wobert_chinese_plus_base
robertachinese-roberta-wwm-extHFLhfl/chinese-roberta-wwm-ext<br>hfl/chinese-roberta-wwm-ext-large<br>(large的mlm权重是随机初始化)chinese-roberta-wwm-ext<br>chinese-roberta-wwm-ext-large
roberta-small/tiny追一科技Tongjilibo/chinese_roberta_L-4_H-312_A-12<br>Tongjilibo/chinese_roberta_L-6_H-384_A-12
roberta-baseFacebookAIroberta-baseroberta-base
guwenbertethanytethanyt/guwenbert-baseguwenbert-base
albertalbert_zh<br>albert_pytorchbrightmartvoidful/albert_chinese_tiny<br>voidful/albert_chinese_small<br>voidful/albert_chinese_base<br>voidful/albert_chinese_large<br>voidful/albert_chinese_xlarge<br>voidful/albert_chinese_xxlargealbert_chinese_tinyalbert_chinese_small<br>albert_chinese_base<br>albert_chinese_large<br>albert_chinese_xlarge<br>albert_chinese_xxlarge
nezhaNEZHA<br>NeZha_Chinese_PyTorchhuawei_noahsijunhe/nezha-cn-base<br>sijunhe/nezha-cn-large<br>sijunhe/nezha-base-wwm<br>sijunhe/nezha-large-wwmnezha-cn-base<br>nezha-cn-large<br>nezha-base-wwm<br>nezha-large-wwm
nezha_gpt_dialogbojoneTongjilibo/nezha_gpt_dialog
xlnetChinese-XLNetHFLhfl/chinese-xlnet-basechinese-xlnet-base
tranformer_xlhuggingfacetransfo-xl/transfo-xl-wt103transfo-xl-wt103
debertaErlangshen-DeBERTa-v2IDEAIDEA-CCNL/Erlangshen-DeBERTa-v2-97M-Chinese<br>IDEA-CCNL/Erlangshen-DeBERTa-v2-320M-Chinese<br>IDEA-CCNL/Erlangshen-DeBERTa-v2-710M-ChineseErlangshen-DeBERTa-v2-97M-Chinese<br>Erlangshen-DeBERTa-v2-320M-Chinese<br>Erlangshen-DeBERTa-v2-710M-Chinese
electraChinese-ELECTRAHFLhfl/chinese-electra-base-discriminatorchinese-electra-base-discriminator
ernieernie百度文心nghuyong/ernie-1.0-base-zh<br>nghuyong/ernie-3.0-base-zhernie-1.0-base-zh<br>ernie-3.0-base-zh
roformerroformer追一科技junnyu/roformer_chinese_baseroformer_chinese_base
roformer_v2追一科技junnyu/roformer_v2_chinese_char_baseroformer_v2_chinese_char_base
simbertsimbert追一科技Tongjilibo/simbert-chinese-base<br>Tongjilibo/simbert-chinese-small<br>Tongjilibo/simbert-chinese-tiny
simbert_v2/roformer-sim追一科技junnyu/roformer_chinese_sim_char_basejunnyu/roformer_chinese_sim_char_ft_basejunnyu/roformer_chinese_sim_char_smalljunnyu/roformer_chinese_sim_char_ft_smallroformer_chinese_sim_char_base<br>roformer_chinese_sim_char_ft_base<br>roformer_chinese_sim_char_small<br>roformer_chinese_sim_char_ft_small
gauGAU-alpha追一科技Tongjilibo/chinese_GAU-alpha-char_L-24_H-768
uieuie<br>uie_pytorch百度Tongjilibo/uie-base
gptCDial-GPTthu-coaithu-coai/CDial-GPT_LCCC-base<br>thu-coai/CDial-GPT_LCCC-largeCDial-GPT_LCCC-base<br>CDial-GPT_LCCC-large
cmp_lm(26亿)清华TsinghuaAI/CPM-GenerateCPM-Generate
nezha_genhuawei_noahTongjilibo/chinese_nezha_gpt_L-12_H-768_A-12
gpt2-chinese-cluecorpussmallUERuer/gpt2-chinese-cluecorpussmallgpt2-chinese-cluecorpussmall
gpt2-mlimcaspartorch<br>BaiduYun(84dh)gpt2-ml_15g_corpus<br>gpt2-ml_30g_corpus
bartbart_base_chinese复旦fnlpfnlp/bart-base-chinese<br>v1.0bart-base-chinese<br>bart-base-chinese-v1.0
t5t5UERuer/t5-small-chinese-cluecorpussmall<br>uer/t5-base-chinese-cluecorpussmallt5-base-chinese-cluecorpussmall<br>t5-small-chinese-cluecorpussmall
mt5谷歌google/mt5-basemt5-base
t5_pegasus追一科技Tongjilibo/chinese_t5_pegasus_small<br>Tongjilibo/chinese_t5_pegasus_base
chatyuanclue-aiClueAI/ChatYuan-large-v1<br>ClueAI/ChatYuan-large-v2ChatYuan-large-v1<br>ChatYuan-large-v2
PromptCLUEclue-aiClueAI/PromptCLUE-basePromptCLUE-base
chatglmchatglm-6bTHUDMTHUDM/chatglm-6b<br>THUDM/chatglm-6b-int8<br>THUDM/chatglm-6b-int4<br>v0.1.0chatglm-6b<br>chatglm-6b-int8<br>chatglm-6b-int4<br>chatglm-6b-v0.1.0
chatglm2-6bTHUDMTHUDM/chatglm2-6b<br>THUDM/chatglm2-6b-int4<br>THUDM/chatglm2-6b-32kchatglm2-6b<br>chatglm2-6b-int4<br>chatglm2-6b-32k
chatglm3-6bTHUDMTHUDM/chatglm3-6b<br>THUDM/chatglm3-6b-32kchatglm3-6b<br>chatglm3-6b-32k
glm4-9bTHUDMTHUDM/glm-4-9b<br>THUDM/glm-4-9b-chat<br>THUDM/glm-4-9b-chat-1mglm-4-9b<br>glm-4-9b-chat<br>glm-4-9b-chat-1m
llamallamametallama-7b<br>llama-13b
llama-2metameta-llama/Llama-2-7b-hf<br>meta-llama/Llama-2-7b-chat-hf<br>meta-llama/Llama-2-13b-hf<br>meta-llama/Llama-2-13b-chat-hfLlama-2-7b-hf<br>Llama-2-7b-chat-hf<br>Llama-2-13b-hf<br>Llama-2-13b-chat-hf
llama-3metameta-llama/Meta-Llama-3-8B<br>meta-llama/Meta-Llama-3-8B-InstructMeta-Llama-3-8B<br>Meta-Llama-3-8B-Instruct
llama-3.1metameta-llama/Meta-Llama-3.1-8B<br>meta-llama/Meta-Llama-3.1-8B-InstructMeta-Llama-3.1-8B<br>Meta-Llama-3.1-8B-Instruct
Chinese-LLaMA-AlpacaHFLchinese_alpaca_plus_7b<br>chinese_llama_plus_7b
Chinese-LLaMA-Alpaca-2HFL待添加
Chinese-LLaMA-Alpaca-3HFL待添加
Belle_llamaLianjiaTechBelleGroup/BELLE-LLaMA-7B-2M-enc合成说明BELLE-LLaMA-7B-2M-enc
ZiyaIDEA-CCNLIDEA-CCNL/Ziya-LLaMA-13B-v1<br>IDEA-CCNL/Ziya-LLaMA-13B-v1.1<br>IDEA-CCNL/Ziya-LLaMA-13B-Pretrain-v1Ziya-LLaMA-13B-v1<br>Ziya-LLaMA-13B-v1.1
Baichuanbaichuan-incbaichuan-inc/Baichuan-7B<br>baichuan-inc/Baichuan-13B-Base<br>baichuan-inc/Baichuan-13B-ChatBaichuan-7B<br>Baichuan-13B-Base<br>Baichuan-13B-Chat
Baichuan2baichuan-incbaichuan-inc/Baichuan2-7B-Base<br>baichuan-inc/Baichuan2-7B-Chat<br>baichuan-inc/Baichuan2-13B-Base<br>baichuan-inc/Baichuan2-13B-ChatBaichuan2-7B-Base<br>Baichuan2-7B-Chat<br>Baichuan2-13B-Base<br>Baichuan2-13B-Chat
vicunalmsyslmsys/vicuna-7b-v1.5vicuna-7b-v1.5
Yi01-ai01-ai/Yi-6B<br>01-ai/Yi-6B-200K<br>01-ai/Yi-9B<br>01-ai/Yi-9B-200KYi-6B<br>Yi-6B-200K<br>Yi-9B<br>Yi-9B-200K
Yi-1.501-ai01-ai/Yi-1.5-6B<br>01-ai/Yi-1.5-6B-Chat<br>01-ai/Yi-1.5-9B<br>01-ai/Yi-1.5-9B-32K<br>01-ai/Yi-1.5-9B-Chat<br>01-ai/Yi-1.5-9B-Chat-16KYi-1.5-6B<br>Yi-1.5-6B-Chat<br>Yi-1.5-9B<br>Yi-1.5-9B-32K<br>Yi-1.5-9B-Chat<br>Yi-1.5-9B-Chat-16K
bloombloombigsciencebigscience/bloom-560m<br>bigscience/bloomz-560mbloom-560m<br>bloomz-560m
QwenQwen阿里云Qwen/Qwen-1_8B<br>Qwen/Qwen-1_8B-Chat<br>Qwen/Qwen-7B<br>Qwen/Qwen-7B-Chat<br>Qwen/Qwen-14B<br>Qwen/Qwen-14B-ChatQwen-1_8B<br>Qwen-1_8B-Chat<br>Qwen-7B<br>Qwen-7B-Chat<br>Qwen-14B<br>Qwen-14B-Chat
Qwen1.5阿里云Qwen/Qwen1.5-0.5B<br>Qwen/Qwen1.5-0.5B-Chat<br>Qwen/Qwen1.5-1.8B<br>Qwen/Qwen1.5-1.8B-Chat<br>Qwen/Qwen1.5-7B<br>Qwen/Qwen1.5-7B-Chat<br>Qwen/Qwen1.5-14B<br>Qwen/Qwen1.5-14B-ChatQwen1.5-0.5B<br>Qwen1.5-0.5B-Chat<br>Qwen1.5-1.8B<br>Qwen1.5-1.8B-Chat<br>Qwen1.5-7B<br>Qwen1.5-7B-Chat<br>Qwen1.5-14B<br>Qwen1.5-14B-Chat
Qwen2阿里云Qwen/Qwen2-0.5B<br>Qwen/Qwen2-0.5B-Instruct<br>Qwen/Qwen2-1.5B<br>Qwen/Qwen2-1.5B-Instruct<br>Qwen/Qwen2-7B<br>Qwen/Qwen2-7B-InstructQwen2-0.5B<br>Qwen2-0.5B-Instruct<br>Qwen2-1.5B<br>Qwen2-1.5B-Instruct<br>Qwen2-7B<br>Qwen2-7B-Instruct
InternLMInternLM上海人工智能实验室internlm/internlm-chat-7b<br>internlm/internlm-7binternlm-7b<br>internlm-chat-7b
InternLM2上海人工智能实验室internlm/internlm2-1_8b<br>internlm/internlm2-chat-1_8b<br>internlm/internlm2-7b<br>internlm/internlm2-chat-7b<br>internlm/internlm2-20b<br>internlm/internlm2-chat-20binternlm2-1_8b<br>internlm2-chat-1_8b<br>internlm2-7b<br>internlm2-chat-7b
InternLM2.5上海人工智能实验室internlm/internlm2_5-7b<br>internlm/internlm2_5-7b-chat<br>internlm/internlm2_5-7b-chat-1minternlm2_5-7b<br>internlm2_5-7b-chat<br>internlm2_5-7b-chat-1m
FalconFalcontiiuaetiiuae/falcon-rw-1b<br>tiiuae/falcon-7b<br>tiiuae/falcon-7b-instructfalcon-rw-1b<br>falcon-7b<br>falcon-7b-instruct
DeepSeekDeepSeek-MoE幻方量化deepseek-ai/deepseek-moe-16b-base<br>deepseek-ai/deepseek-moe-16b-chatdeepseek-moe-16b-base<br>deepseek-moe-16b-chat
DeepSeek-LLM幻方量化deepseek-ai/deepseek-llm-7b-base<br>deepseek-ai/deepseek-llm-7b-chatdeepseek-llm-7b-base<br>deepseek-llm-7b-chat
DeepSeek-V2幻方量化deepseek-ai/DeepSeek-V2-Lite<br>deepseek-ai/DeepSeek-V2-Lite-Chat
DeepSeek-Coder幻方量化待添加
DeepSeek-Coder-V2幻方量化待添加
MiniCPMMiniCPMOpenBMBopenbmb/MiniCPM-2B-sft-bf16<br>openbmb/MiniCPM-2B-dpo-bf16<br>openbmb/MiniCPM-2B-128k<br>openbmb/MiniCPM-1B-sft-bf16MiniCPM-2B-sft-bf16<br>MiniCPM-2B-dpo-bf16<br>MiniCPM-2B-128k<br>MiniCPM-1B-sft-bf16
MiniCPM-VOpenBMB待添加
embeddingtext2vec-base-chineseshibing624shibing624/text2vec-base-chinesetext2vec-base-chinese
m3emoka-aimoka-ai/m3e-basem3e-base
bgeBAAIBAAI/bge-large-en-v1.5<br>BAAI/bge-large-zh-v1.5<br>BAAI/bge-base-en-v1.5<br>BAAI/bge-base-zh-v1.5<br>BAAI/bge-small-en-v1.5<br>BAAI/bge-small-zh-v1.5bge-large-en-v1.5<br>bge-large-zh-v1.5<br>bge-base-en-v1.5<br>bge-base-zh-v1.5<br>bge-small-en-v1.5<br>bge-small-zh-v1.5
gtethenlperthenlper/gte-large-zh<br>thenlper/gte-base-zhgte-base-zh<br>gte-large-zh

*注:

  1. 高亮格式(如bert-base-chinese)的表示可直接build_transformer_model()联网下载
  2. 国内镜像网站加速下载
    • HF_ENDPOINT=https://hf-mirror.com python your_script.py
    • export HF_ENDPOINT=https://hf-mirror.com后再执行python代码
    • 在python代码开头如下设置
    import os
    os.environ['HF_ENDPOINT'] = "https://hf-mirror.com"
    

6. 鸣谢

7. 引用

@misc{bert4torch,
  title={bert4torch},
  author={Bo Li},
  year={2022},
  howpublished={\url{https://github.com/Tongjilibo/bert4torch}},
}

8. 其他

<table border="0"> <tbody> <tr align="center" > <td> <a href="https://github.com/Tongjilibo"><img width="200" height="250" src="./docs/pics/wechat.jpg" alt="pic"></a><br> <a href="https://github.com/Tongjilibo">微信号</a> </td> <td> <a href="https://github.com/Tongjilibo"><img width="190" height="250" src="./docs/pics/wechat_group.jpg" alt="pic"></a><br> <a href="https://github.com/Tongjilibo">微信群</a> </td> <td> <a href="https://star-history.com/#Tongjilibo/bert4torch&Date"><img width="400" height="250" src="https://api.star-history.com/svg?repos=Tongjilibo/bert4torch&type=Date" alt="pic"></a><br> <a href="https://star-history.com/#Tongjilibo/bert4torch&Date">Star History Chart</a> </td> </tr> </tbody> </table>