Home

Awesome

bert4torch

licence GitHub release PyPI PyPI - Downloads GitHub stars GitHub Issues contributions welcome Generic badge

Documentation | Torch4keras | Examples | build_MiniLLM_from_scratch | bert4vector

目录

1. 下载安装

安装稳定版

pip install bert4torch

安装最新版

pip install git+https://github.com/Tongjilibo/bert4torch

2. 功能

功能bert4torchtransformers备注
训练进度条进度条打印loss和定义的metrics
分布式训练dp/ddptorch自带dp/ddp
各类callbacks日志/tensorboard/earlystop/wandb等
大模型推理,stream/batch输出各个模型是通用的,无需单独维护脚本
大模型微调lora依赖peft库,pv2自带
丰富tricks对抗训练等tricks即插即用
代码简洁易懂,自定义空间大代码复用度高, keras代码训练风格
仓库的维护能力/影响力/使用量/兼容性目前仓库个人维护
一键部署大模型

3. 快速上手

3.1 上手教程

3.2 命令行快速部署大模型服务

4. 版本和更新历史

4.1 版本历史

更新日期bert4torchtorch4keras版本说明
202409280.5.40.2.7【新功能】增加deepseek系列、MiniCPM、MiniCPMV、llama3.2、Qwen2.5;支持device_map=auto;【修复】修复batch_generate和n>1的bug
202408140.5.30.2.6【新功能】增加llama3.1/Yi1.5;自动选择从hfmirror下载;支持命令行参数bert4torch-llm-server
202408010.5.20.2.5【新功能】chatglm/qwen系列支持function call调用, 增加internlm2系列;【小优化】简化pipeline中chat demo的调用,generate的终止token元素允许为列表, 统一rope_scaling参数名,增加rope衍生类;【bug】修复flash_attn2的推理bug, 修复bart的tie_word_embedding的bug

更多版本

4.2 更新历史

更多历史

5. 预训练权重

模型分类模型名称权重来源权重链接/checkpoint_pathconfig_path
bertbert-base-chinesegoogle-bertbert-base-chinesebert-base-chinese
chinese_L-12_H-768_A-12谷歌tf权重<br>Tongjilibo/bert-chinese_L-12_H-768_A-12
chinese-bert-wwm-extHFLhfl/chinese-bert-wwm-exthfl/chinese-bert-wwm-ext
bert-base-multilingual-casedgoogle-bertbert-base-multilingual-casedbert-base-multilingual-cased
MacBERTHFLhfl/chinese-macbert-base<br>hfl/chinese-macbert-largehfl/chinese-macbert-base<br>hfl/chinese-macbert-large
WoBERT追一科技junnyu/wobert_chinese_basejunnyu/wobert_chinese_plus_basejunnyu/wobert_chinese_base<br>junnyu/wobert_chinese_plus_base
robertachinese-roberta-wwm-extHFLhfl/chinese-roberta-wwm-ext<br>hfl/chinese-roberta-wwm-ext-large<br>(large的mlm权重是随机初始化)hfl/chinese-roberta-wwm-ext<br>hfl/chinese-roberta-wwm-ext-large
roberta-small/tiny追一科技Tongjilibo/chinese_roberta_L-4_H-312_A-12<br>Tongjilibo/chinese_roberta_L-6_H-384_A-12
roberta-baseFacebookAIroberta-baseroberta-base
guwenbertethanytethanyt/guwenbert-baseethanyt/guwenbert-base
albertalbert_zh<br>albert_pytorchbrightmartvoidful/albert_chinese_tiny<br>voidful/albert_chinese_small<br>voidful/albert_chinese_base<br>voidful/albert_chinese_large<br>voidful/albert_chinese_xlarge<br>voidful/albert_chinese_xxlargevoidful/albert_chinese_tiny<br>voidful/albert_chinese_small<br>voidful/albert_chinese_base<br>voidful/albert_chinese_large<br>voidful/albert_chinese_xlarge<br>voidful/albert_chinese_xxlarge
nezhaNEZHA<br>NeZha_Chinese_PyTorchhuawei_noahsijunhe/nezha-cn-base<br>sijunhe/nezha-cn-large<br>sijunhe/nezha-base-wwm<br>sijunhe/nezha-large-wwmsijunhe/nezha-cn-base<br>sijunhe/nezha-cn-large<br>sijunhe/nezha-base-wwm<br>sijunhe/nezha-large-wwm
nezha_gpt_dialogbojoneTongjilibo/nezha_gpt_dialog
xlnetChinese-XLNetHFLhfl/chinese-xlnet-basehfl/chinese-xlnet-base
tranformer_xlhuggingfacetransfo-xl/transfo-xl-wt103transfo-xl/transfo-xl-wt103
debertaErlangshen-DeBERTa-v2IDEAIDEA-CCNL/Erlangshen-DeBERTa-v2-97M-Chinese<br>IDEA-CCNL/Erlangshen-DeBERTa-v2-320M-Chinese<br>IDEA-CCNL/Erlangshen-DeBERTa-v2-710M-ChineseIDEA-CCNL/Erlangshen-DeBERTa-v2-97M-Chinese<br>IDEA-CCNL/Erlangshen-DeBERTa-v2-320M-Chinese<br>IDEA-CCNL/Erlangshen-DeBERTa-v2-710M-Chinese
electraChinese-ELECTRAHFLhfl/chinese-electra-base-discriminatorhfl/chinese-electra-base-discriminator
ernieernie百度文心nghuyong/ernie-1.0-base-zh<br>nghuyong/ernie-3.0-base-zhnghuyong/ernie-1.0-base-zh<br>nghuyong/ernie-3.0-base-zh
roformerroformer追一科技junnyu/roformer_chinese_basejunnyu/roformer_chinese_base
roformer_v2追一科技junnyu/roformer_v2_chinese_char_basejunnyu/roformer_v2_chinese_char_base
simbertsimbert追一科技Tongjilibo/simbert-chinese-base<br>Tongjilibo/simbert-chinese-small<br>Tongjilibo/simbert-chinese-tiny
simbert_v2/roformer-sim追一科技junnyu/roformer_chinese_sim_char_basejunnyu/roformer_chinese_sim_char_ft_basejunnyu/roformer_chinese_sim_char_smalljunnyu/roformer_chinese_sim_char_ft_smalljunnyu/roformer_chinese_sim_char_base<br>junnyu/roformer_chinese_sim_char_ft_base<br>junnyu/roformer_chinese_sim_char_small<br>junnyu/roformer_chinese_sim_char_ft_small
gauGAU-alpha追一科技Tongjilibo/chinese_GAU-alpha-char_L-24_H-768
uieuie<br>uie_pytorch百度Tongjilibo/uie-base
gptCDial-GPTthu-coaithu-coai/CDial-GPT_LCCC-base<br>thu-coai/CDial-GPT_LCCC-largethu-coai/CDial-GPT_LCCC-base<br>thu-coai/CDial-GPT_LCCC-large
cmp_lm(26亿)清华TsinghuaAI/CPM-Generate TsinghuaAI/CPM-Generate
nezha_genhuawei_noahTongjilibo/chinese_nezha_gpt_L-12_H-768_A-12
gpt2-chinese-cluecorpussmallUERuer/gpt2-chinese-cluecorpussmalluer/gpt2-chinese-cluecorpussmall
gpt2-mlimcaspartorch<br>BaiduYun(84dh)gpt2-ml_15g_corpus<br>gpt2-ml_30g_corpus
bartbart_base_chinese复旦fnlpfnlp/bart-base-chinese<br>v1.0fnlp/bart-base-chinese<br>fnlp/bart-base-chinese-v1.0
t5t5UERuer/t5-small-chinese-cluecorpussmall<br>uer/t5-base-chinese-cluecorpussmalluer/t5-base-chinese-cluecorpussmall<br>uer/t5-small-chinese-cluecorpussmall
mt5谷歌google/mt5-basegoogle/mt5-base
t5_pegasus追一科技Tongjilibo/chinese_t5_pegasus_small<br>Tongjilibo/chinese_t5_pegasus_base
chatyuanclue-aiClueAI/ChatYuan-large-v1<br>ClueAI/ChatYuan-large-v2ClueAI/ChatYuan-large-v1<br>ClueAI/ChatYuan-large-v2
PromptCLUEclue-aiClueAI/PromptCLUE-baseClueAI/PromptCLUE-base
chatglmchatglm-6bTHUDMTHUDM/chatglm-6b<br>THUDM/chatglm-6b-int8<br>THUDM/chatglm-6b-int4<br>v0.1.0THUDM/chatglm-6b<br>THUDM/chatglm-6b-int8<br>THUDM/chatglm-6b-int4<br>THUDM/chatglm-6b-v0.1.0
chatglm2-6bTHUDMTHUDM/chatglm2-6b<br>THUDM/chatglm2-6b-int4<br>THUDM/chatglm2-6b-32kTHUDM/chatglm2-6b<br>THUDM/chatglm2-6b-int4<br>THUDM/chatglm2-6b-32k
chatglm3-6bTHUDMTHUDM/chatglm3-6b<br>THUDM/chatglm3-6b-32kTHUDM/chatglm3-6b<br>THUDM/chatglm3-6b-32k
glm4-9bTHUDMTHUDM/glm-4-9b<br>THUDM/glm-4-9b-chat<br>THUDM/glm-4-9b-chat-1mTHUDM/glm-4-9b<br>THUDM/glm-4-9b-chat<br>THUDM/glm-4-9b-chat-1m
llamallamametameta-llama/llama-7b<br>meta-llama/llama-13b
llama-2metameta-llama/Llama-2-7b-hf<br>meta-llama/Llama-2-7b-chat-hf<br>meta-llama/Llama-2-13b-hf<br>meta-llama/Llama-2-13b-chat-hfmeta-llama/Llama-2-7b-hf<br>meta-llama/Llama-2-7b-chat-hf<br>meta-llama/Llama-2-13b-hf<br>meta-llama/Llama-2-13b-chat-hf
llama-3metameta-llama/Meta-Llama-3-8B<br>meta-llama/Meta-Llama-3-8B-Instructmeta-llama/Meta-Llama-3-8B<br>meta-llama/Meta-Llama-3-8B-Instruct
llama-3.1metameta-llama/Meta-Llama-3.1-8B<br>meta-llama/Meta-Llama-3.1-8B-Instructmeta-llama/Meta-Llama-3.1-8B<br>meta-llama/Meta-Llama-3.1-8B-Instruct
llama-3.2metameta-llama/Llama-3.2-1B<br>meta-llama/Llama-3.2-1B-Instruct<br>meta-llama/Llama-3.2-3B<br>meta-llama/Llama-3.2-3B-Instructmeta-llama/Llama-3.2-1B<br>meta-llama/Llama-3.2-1B-Instruct<br>meta-llama/Llama-3.2-3B<br>meta-llama/Llama-3.2-3B-Instruct
Chinese-LLaMA-AlpacaHFLhfl/chinese_alpaca_plus_7b<br>hfl/chinese_llama_plus_7b
Chinese-LLaMA-Alpaca-2HFL待添加
Chinese-LLaMA-Alpaca-3HFL待添加
Belle_llamaLianjiaTechBelleGroup/BELLE-LLaMA-7B-2M-enc合成说明BelleGroup/BELLE-LLaMA-7B-2M-enc
ZiyaIDEA-CCNLIDEA-CCNL/Ziya-LLaMA-13B-v1<br>IDEA-CCNL/Ziya-LLaMA-13B-v1.1<br>IDEA-CCNL/Ziya-LLaMA-13B-Pretrain-v1IDEA-CCNL/Ziya-LLaMA-13B-v1<br>IDEA-CCNL/Ziya-LLaMA-13B-v1.1
vicunalmsyslmsys/vicuna-7b-v1.5lmsys/vicuna-7b-v1.5
BaichuanBaichuanbaichuan-incbaichuan-inc/Baichuan-7B<br>baichuan-inc/Baichuan-13B-Base<br>baichuan-inc/Baichuan-13B-Chatbaichuan-inc/Baichuan-7B<br>baichuan-inc/Baichuan-13B-Base<br>baichuan-inc/Baichuan-13B-Chat
Baichuan2baichuan-incbaichuan-inc/Baichuan2-7B-Base<br>baichuan-inc/Baichuan2-7B-Chat<br>baichuan-inc/Baichuan2-13B-Base<br>baichuan-inc/Baichuan2-13B-Chatbaichuan-inc/Baichuan2-7B-Base<br>baichuan-inc/Baichuan2-7B-Chat<br>baichuan-inc/Baichuan2-13B-Base<br>baichuan-inc/Baichuan2-13B-Chat
YiYi01-ai01-ai/Yi-6B<br>01-ai/Yi-6B-200K<br>01-ai/Yi-9B<br>01-ai/Yi-9B-200K01-ai/Yi-6B<br>01-ai/Yi-6B-200K<br>01-ai/Yi-9B<br>01-ai/Yi-9B-200K
Yi-1.501-ai01-ai/Yi-1.5-6B<br>01-ai/Yi-1.5-6B-Chat<br>01-ai/Yi-1.5-9B<br>01-ai/Yi-1.5-9B-32K<br>01-ai/Yi-1.5-9B-Chat<br>01-ai/Yi-1.5-9B-Chat-16K01-ai/Yi-1.5-6B<br>01-ai/Yi-1.5-6B-Chat<br>01-ai/Yi-1.5-9B<br>01-ai/Yi-1.5-9B-32K<br>01-ai/Yi-1.5-9B-Chat<br>01-ai/Yi-1.5-9B-Chat-16K
bloombloombigsciencebigscience/bloom-560m<br>bigscience/bloomz-560mbigscience/bloom-560m<br>bigscience/bloomz-560m
QwenQwen阿里云Qwen/Qwen-1_8B<br>Qwen/Qwen-1_8B-Chat<br>Qwen/Qwen-7B<br>Qwen/Qwen-7B-Chat<br>Qwen/Qwen-14B<br>Qwen/Qwen-14B-ChatQwen/Qwen-1_8B<br>Qwen/Qwen-1_8B-Chat<br>Qwen/Qwen-7B<br>Qwen/Qwen-7B-Chat<br>Qwen/Qwen-14B<br>Qwen/Qwen-14B-Chat
Qwen1.5阿里云Qwen/Qwen1.5-0.5B<br>Qwen/Qwen1.5-0.5B-Chat<br>Qwen/Qwen1.5-1.8B<br>Qwen/Qwen1.5-1.8B-Chat<br>Qwen/Qwen1.5-7B<br>Qwen/Qwen1.5-7B-Chat<br>Qwen/Qwen1.5-14B<br>Qwen/Qwen1.5-14B-ChatQwen/Qwen1.5-0.5B<br>Qwen/Qwen1.5-0.5B-Chat<br>Qwen/Qwen1.5-1.8B<br>Qwen/Qwen1.5-1.8B-Chat<br>Qwen/Qwen1.5-7B<br>Qwen/Qwen1.5-7B-Chat<br>Qwen/Qwen1.5-14B<br>Qwen/Qwen1.5-14B-Chat
Qwen2阿里云Qwen/Qwen2-0.5B<br>Qwen/Qwen2-0.5B-Instruct<br>Qwen/Qwen2-1.5B<br>Qwen/Qwen2-1.5B-Instruct<br>Qwen/Qwen2-7B<br>Qwen/Qwen2-7B-InstructQwen/Qwen2-0.5B<br>Qwen/Qwen2-0.5B-Instruct<br>Qwen/Qwen2-1.5B<br>Qwen/Qwen2-1.5B-Instruct<br>Qwen/Qwen2-7B<br>Qwen/Qwen2-7B-Instruct
Qwen2-VL阿里云Qwen/Qwen2-VL-2B-Instruct<br>Qwen/Qwen2-VL-7B-InstructQwen/Qwen2-VL-2B-Instruct<br>Qwen/Qwen2-VL-7B-Instruct
Qwen2.5阿里云Qwen/Qwen2.5-0.5B<br>Qwen/Qwen2.5-0.5B-Instruct<br>Qwen/Qwen2.5-1.5B<br>Qwen/Qwen2.5-1.5B-Instruct<br>Qwen/Qwen2.5-3B<br>Qwen/Qwen2.5-3B-Instruct<br>Qwen/Qwen2.5-7B<br>Qwen/Qwen2.5-7B-Instruct<br>Qwen/Qwen2.5-14B<br>Qwen/Qwen2.5-14B-InstructQwen/Qwen2.5-0.5B<br>Qwen/Qwen2.5-0.5B-Instruct<br>Qwen/Qwen2.5-1.5B<br>Qwen/Qwen2.5-1.5B-Instruct<br>Qwen/Qwen2.5-3B<br>Qwen/Qwen2.5-3B-Instruct<br>Qwen/Qwen2.5-7B<br>Qwen/Qwen2.5-7B-Instruct<br>Qwen/Qwen2.5-14B<br>Qwen/Qwen2.5-14B-Instruct
InternLMInternLM上海人工智能实验室internlm/internlm-7b<br>internlm/internlm-chat-7binternlm/internlm-7b<br>internlm/internlm-chat-7b
InternLM2上海人工智能实验室internlm/internlm2-1_8b<br>internlm/internlm2-chat-1_8b<br>internlm/internlm2-7b<br>internlm/internlm2-chat-7b<br>internlm/internlm2-20b<br>internlm/internlm2-chat-20binternlm/internlm2-1_8b<br>internlm/internlm2-chat-1_8b<br>internlm/internlm2-7b<br>internlm/internlm2-chat-7b
InternLM2.5上海人工智能实验室internlm/internlm2_5-7b<br>internlm/internlm2_5-7b-chat<br>internlm/internlm2_5-7b-chat-1minternlm/internlm2_5-7b<br>internlm/internlm2_5-7b-chat<br>internlm/internlm2_5-7b-chat-1m
FalconFalcontiiuaetiiuae/falcon-rw-1b<br>tiiuae/falcon-7b<br>tiiuae/falcon-7b-instructtiiuae/falcon-rw-1b<br>tiiuae/falcon-7b<br>tiiuae/falcon-7b-instruct
DeepSeekDeepSeek-MoE深度求索deepseek-ai/deepseek-moe-16b-base<br>deepseek-ai/deepseek-moe-16b-chatdeepseek-ai/deepseek-moe-16b-base<br>deepseek-ai/deepseek-moe-16b-chat
DeepSeek-LLM深度求索deepseek-ai/deepseek-llm-7b-base<br>deepseek-ai/deepseek-llm-7b-chatdeepseek-ai/deepseek-llm-7b-base<br>deepseek-ai/deepseek-llm-7b-chat
DeepSeek-V2深度求索deepseek-ai/DeepSeek-V2-Lite<br>deepseek-ai/DeepSeek-V2-Lite-Chatdeepseek-ai/DeepSeek-V2-Lite<br>deepseek-ai/DeepSeek-V2-Lite-Chat
DeepSeek-Coder深度求索deepseek-ai/deepseek-coder-1.3b-base<br>deepseek-ai/deepseek-coder-1.3b-instruct<br>deepseek-ai/deepseek-coder-6.7b-base<br>deepseek-ai/deepseek-coder-6.7b-instruct<br>deepseek-ai/deepseek-coder-7b-base-v1.5<br>deepseek-ai/deepseek-coder-7b-instruct-v1.5deepseek-ai/deepseek-coder-1.3b-base<br>deepseek-ai/deepseek-coder-1.3b-instruct<br>deepseek-ai/deepseek-coder-6.7b-base<br>deepseek-ai/deepseek-coder-6.7b-instruct<br>deepseek-ai/deepseek-coder-7b-base-v1.5<br>deepseek-ai/deepseek-coder-7b-instruct-v1.5
DeepSeek-Coder-V2深度求索deepseek-ai/DeepSeek-Coder-V2-Lite-Base<br>deepseek-ai/DeepSeek-Coder-V2-Lite-Instructdeepseek-ai/DeepSeek-Coder-V2-Lite-Base<br>deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct
DeepSeek-Math深度求索deepseek-ai/deepseek-math-7b-base<br>deepseek-ai/deepseek-math-7b-instruct<br>deepseek-ai/deepseek-math-7b-rldeepseek-ai/deepseek-math-7b-base<br>deepseek-ai/deepseek-math-7b-instruct<br>deepseek-ai/deepseek-math-7b-rl
MiniCPMMiniCPMOpenBMBopenbmb/MiniCPM-2B-sft-bf16<br>openbmb/MiniCPM-2B-dpo-bf16<br>openbmb/MiniCPM-2B-128k<br>openbmb/MiniCPM-1B-sft-bf16openbmb/MiniCPM-2B-sft-bf16<br>openbmb/MiniCPM-2B-dpo-bf16<br>openbmb/MiniCPM-2B-128k<br>openbmb/MiniCPM-1B-sft-bf16
MiniCPM-VOpenBMBopenbmb/MiniCPM-V-2_6<br>openbmb/MiniCPM-Llama3-V-2_5openbmb/MiniCPM-V-2_6<br>openbmb/MiniCPM-Llama3-V-2_5
embeddingtext2vec-base-chineseshibing624shibing624/text2vec-base-chineseshibing624/text2vec-base-chinese
m3emoka-aimoka-ai/m3e-basemoka-ai/m3e-base
bgeBAAIBAAI/bge-large-en-v1.5<br>BAAI/bge-large-zh-v1.5<br>BAAI/bge-base-en-v1.5<br>BAAI/bge-base-zh-v1.5<br>BAAI/bge-small-en-v1.5<br>BAAI/bge-small-zh-v1.5BAAI/bge-large-en-v1.5<br>BAAI/bge-large-zh-v1.5<br>BAAI/bge-base-en-v1.5<br>BAAI/bge-base-zh-v1.5<br>BAAI/bge-small-en-v1.5<br>BAAI/bge-small-zh-v1.5
gtethenlperthenlper/gte-large-zh<br>thenlper/gte-base-zhthenlper/gte-base-zh<br>thenlper/gte-large-zh

*注:

  1. 高亮格式(如bert-base-chinese)的表示可直接build_transformer_model()联网下载
  2. 国内镜像网站加速下载
    • HF_ENDPOINT=https://hf-mirror.com python your_script.py
    • export HF_ENDPOINT=https://hf-mirror.com后再执行python代码
    • 在python代码开头如下设置
    import os
    os.environ['HF_ENDPOINT'] = "https://hf-mirror.com"
    

6. 鸣谢

7. 引用

@misc{bert4torch,
  title={bert4torch},
  author={Bo Li},
  year={2022},
  howpublished={\url{https://github.com/Tongjilibo/bert4torch}},
}

8. 其他

<table border="0"> <tbody> <tr align="center" > <td> <a href="https://github.com/Tongjilibo"><img width="200" height="250" src="./docs/pics/wechat.jpg" alt="pic"></a><br> <a href="https://github.com/Tongjilibo">微信号</a> </td> <td> <a href="https://github.com/Tongjilibo"><img width="190" height="250" src="./docs/pics/wechat_group.jpg" alt="pic"></a><br> <a href="https://github.com/Tongjilibo">微信群</a> </td> <td> <a href="https://star-history.com/#Tongjilibo/bert4torch&Date"><img width="400" height="250" src="https://api.star-history.com/svg?repos=Tongjilibo/bert4torch&type=Date" alt="pic"></a><br> <a href="https://star-history.com/#Tongjilibo/bert4torch&Date">Star History Chart</a> </td> </tr> </tbody> </table>