Home

Awesome

<div align="center"> <img src="./assets/minicpm_logo.png" width="500em" ></img> </div> <h4 align="center"> <p> <b>中文</b> | <a href="https://github.com/OpenBMB/MiniCPM/blob/main/README-en.md">English</a> <p> </h4> <p align="center"> <a href="https://openbmb.vercel.app/?category=Chinese+Blog" target="_blank">MiniCPM 技术博客</a> | <a href="https://modelbest.feishu.cn/wiki/D2tFw8Pcsi5CIzkaHNacLK64npg" target="_blank">MiniCPM 知识库</a> | <a href="https://arxiv.org/abs/2404.06395" target="_blank">MiniCPM 论文</a> | <a href="https://github.com/OpenBMB/MiniCPM-V/" target="_blank">MiniCPM-V 仓库</a> | 加入我们的 <a href="https://discord.gg/3cGQn9b3YM" target="_blank">discord</a> 和 <a href="https://github.com/OpenBMB/MiniCPM/blob/main/assets/wechat.jpg" target="_blank">微信群</a> </p>

更新日志🔥

目录

模型下载

HuggingFaceModelScope
MiniCPM3-4BMiniCPM3-4B
MiniCPM-2B-sftMiniCPM-2B-sft
MiniCPM-2B-dpoMiniCPM-2B-dpo
MiniCPM-2B-128kMiniCPM-2B-128k
MiniCPM-MoE-8x2BMiniCPM-MoE-8x2B
MiniCPM-1BMiniCPM-1B
MiniCPM-S-1BMiniCPM-S-1B

注: 更多模型版本见这里

MiniCPM 3.0

MiniCPM 3.0 是一个 4B 参数量的语言模型,相比 MiniCPM1.0/2.0,功能更加全面,综合能力大幅提升,多数评测集上的效果比肩甚至超越众多 7B-9B 模型。

评测结果

综合评测

<table> <tr> <td>评测集</td> <td>Qwen2-7B-Instruct</td> <td>GLM-4-9B-Chat</td> <td>Gemma2-9B-it</td> <td>Llama3.1-8B-Instruct</td> <td>GPT-3.5-Turbo-0125</td> <td>Phi-3.5-mini-Instruct(3.8B)</td> <td>MiniCPM3-4B </td> </tr> <tr> <td colspan="15" align="left"><strong>英文能力</strong></td> </tr> <tr> <td>MMLU</td> <td>70.5</td> <td>72.4</td> <td>72.6</td> <td>69.4</td> <td>69.2</td> <td>68.4</td> <td>67.2 </td> </tr> <tr> <td>BBH</td> <td>64.9</td> <td>76.3</td> <td>65.2</td> <td>67.8</td> <td>70.3</td> <td>68.6</td> <td>70.2 </td> </tr> <tr> <td>MT-Bench</td> <td>8.41</td> <td>8.35</td> <td>7.88</td> <td>8.28</td> <td>8.17</td> <td>8.60</td> <td>8.41 </td> </tr> <tr> <td>IFEVAL (Prompt Strict-Acc.)</td> <td>51.0</td> <td>64.5</td> <td>71.9</td> <td>71.5</td> <td>58.8</td> <td>49.4</td> <td>68.4 </td> </tr> <tr> <td colspan="15" align="left"><strong>中文能力</strong></td> </tr> <tr> <td>CMMLU</td> <td>80.9</td> <td>71.5</td> <td>59.5</td> <td>55.8</td> <td>54.5</td> <td>46.9</td> <td>73.3 </td> </tr> <tr> <td>CEVAL</td> <td>77.2</td> <td>75.6</td> <td>56.7</td> <td>55.2</td> <td>52.8</td> <td>46.1</td> <td>73.6 </td> </tr> <tr> <td>AlignBench v1.1</td> <td>7.10</td> <td>6.61</td> <td>7.10</td> <td>5.68</td> <td>5.82</td> <td>5.73</td> <td>6.74 </td> </tr> <tr> <td>FollowBench-zh (SSR)</td> <td>63.0</td> <td>56.4</td> <td>57.0</td> <td>50.6</td> <td>64.6</td> <td>58.1</td> <td>66.8 </td> </tr> <tr> <td colspan="15" align="left"><strong>数学能力</strong></td> </tr> <tr> <td>MATH</td> <td>49.6</td> <td>50.6</td> <td>46.0</td> <td>51.9</td> <td>41.8</td> <td>46.4</td> <td>46.6 </td> </tr> <tr> <td>GSM8K</td> <td>82.3</td> <td>79.6</td> <td>79.7</td> <td>84.5</td> <td>76.4</td> <td>82.7</td> <td>81.1 </td> </tr> <tr> <td>MathBench</td> <td>63.4</td> <td>59.4</td> <td>45.8</td> <td>54.3</td> <td>48.9</td> <td>54.9</td> <td>65.6 </td> </tr> <tr> <td colspan="15" align="left"><strong>代码能力</strong></td> </tr> <tr> <td>HumanEval+</td> <td>70.1</td> <td>67.1</td> <td>61.6</td> <td>62.8</td> <td>66.5</td> <td>68.9</td> <td>68.3 </td> </tr> <tr> <td>MBPP+</td> <td>57.1</td> <td>62.2</td> <td>64.3</td> <td>55.3</td> <td>71.4</td> <td>55.8</td> <td>63.2 </td> </tr> <tr> <td>LiveCodeBench v3</td> <td>22.2</td> <td>20.2</td> <td>19.2</td> <td>20.4</td> <td>24.0</td> <td>19.6</td> <td>22.6 </td> </tr> <tr> <td colspan="15" align="left"><strong>工具调用能力</strong></td> </tr> <tr> <td>BFCL v2</td> <td>71.6</td> <td>70.1</td> <td>19.2</td> <td>73.3</td> <td>75.4</td> <td>48.4</td> <td>76.0 </td> </tr> <tr> <td colspan="15" align="left"><strong>综合能力</strong></td> </tr> <tr> <td>平均分</td> <td>65.3</td> <td>65.0</td> <td>57.9</td> <td>60.8</td> <td>61.0</td> <td>57.2</td> <td><strong>66.3</strong></td> </tr> </table>

工具调用能力

我们在 Berkeley Function Calling Leaderboard (BFCL) 上测试了模型的工具调用能力,MiniCPM3-4B 在该榜单上的表现超越了多个 7B-9B 参数量的模型,优于 GPT-3.5-Turbo-0125。

<table> <tr> <td>模型</td> <td>总体准确率</td> <td>AST Summary</td> <td>Exec Summary</td> <td>Irrelevance Detection</td> <td>Relevance Detection </td> </tr> <tr> <td>MiniCPM3-4B</td> <td>76.03%</td> <td>68.55%</td> <td>85.54%</td> <td>53.71%</td> <td>90.24% </td> </tr> <tr> <td>Llama3.1-8B-Instruct</td> <td>73.28%</td> <td>64.61%</td> <td>86.48%</td> <td>43.12%</td> <td>85.37% </td> </tr> <tr> <td>Qwen2-7B-Instruct</td> <td>71.61%</td> <td>65.71%</td> <td>79.57%</td> <td>44.70%</td> <td>90.24% </td> </tr> <tr> <td>GLM-4-9B-Chat</td> <td>70.08%</td> <td>60.69%</td> <td>80.02%</td> <td>55.02%</td> <td>82.93% </td> </tr> <tr> <td>Phi-3.5-mini-instruct</td> <td>48.44%</td> <td>38.89%</td> <td>54.04%</td> <td>46.78%</td> <td>65.85% </td> </tr> <tr> <td>Gemma2-9B-it</td> <td>19.18%</td> <td>5.41%</td> <td>18.50%</td> <td>88.88%</td> <td>7.32%</td> </tr> </table>

长文本能力

在 32k 的上下文长度进行大海捞针测试,结果如下图:

needle

同时我们提出LLMxMapReduce,利用分治的策略,理论上可以处理无限长度的文本。我们在InfiniteBench上测试了模型的长文本处理能力,在LLMxMapReduce框架的加持下,MiniCPM3-4B在这个榜单的平均得分能够超越 GPT-4、KimiChat 等标杆模型。

Context lengthQwen2-70bKimi-Chat(2024.06)GPT-4 (From InfiniteBench)MiniCPM 3.0 x MRQwen2-70b x MRLlama3-70bx MR
Math.Find87.9k59.71%18.57%60.00%83.43%54.29%91.43%
Retrieve.KV89.9k29.00%69.20%89.00%93.80%98.80%98.89%
En.Dia103.6K23.00%23.00%7.50%12.50%46.50%17.50%
Code.Debug114.7k45.43%38.32%54.31%25.63%54.82%62.94%
Retrieve.Number122.4k100.00%97.45%100.00%99.32%100.00%99.79%
Retrieve.PassKey122.4k100.00%99.32%100.00%98.81%100.00%100.00%
En.Sum171.5K31.85%29.94%14.73%25.89%32.39%30.63%
En.MC184.4k81.66%79.91%68.12%66.38%83.84%82.10%
En.QA192.6k21.97%18.80%22.44%28.39%23.13%34.70%
Zh.QA2068.6k21.40%19.84%25.96%23.66%19.10%N/A
avg w/o Zh.QA/51.92%52.96%55.33%59.29%64.98%68.64%
avg/48.86%49.65%52.39%55.55%60.39%N/A

模型推理

Huggingface

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
torch.manual_seed(0)

path = 'openbmb/MiniCPM3-4B'
tokenizer = AutoTokenizer.from_pretrained(path)
model = AutoModelForCausalLM.from_pretrained(path, torch_dtype=torch.bfloat16, device_map='cuda', trust_remote_code=True)

responds, history = model.chat(tokenizer, "请写一篇关于人工智能的文章,详细介绍人工智能的未来发展和隐患。", temperature=0.7, top_p=0.7)
print(responds)

SGLang(推荐)

参考 SGLang 官方仓库,通过源码安装最新版本。

python -m sglang.launch_server --model openbmb/MiniCPM3-4B --trust-remote-code --port 30000 --chat-template chatml
from sglang import function, system, user, assistant, gen, set_default_backend, RuntimeEndpoint

@function
def multi_turn_question(s, question_1, question_2):
    s += user(question_1)
    s += assistant(gen("answer_1", max_tokens=1024))
    s += user(question_2)
    s += assistant(gen("answer_2", max_tokens=1024))

set_default_backend(RuntimeEndpoint("http://localhost:30000"))

state = multi_turn_question.run(
    question_1="介绍一下人工智能",
    question_2="写一篇关于它的文章",
)

for m in state.messages():
    print(m["role"], ":", m["content"])

vLLM

llama.cpp

我们提供了 MiniCPM3 的 GGUF 版本,可以直接使用 llama.cpp 推理。

模型微调

LLaMA-Factory

目前模型微调支持 LLaMA-Factory,使用方法参考 LLaMA-Factory 微调

进阶功能

对于以下进阶功能,我们的样例代码中使用 vLLM 进行推理。

工具调用

我们提供了使用 MiniCPM3 调用工具的示例代码:

cd demo/minicpm3/function_call
python function_call.py

如果你想启动一个能够调用工具的推理服务,使用以下代码:

cd demo/minicpm3/function_call
pip install -r requirements.txt
python openai_api_server.py \
    --model openbmb/MiniCPM3-4B \
    --served-model-name MiniCPM3-4B \
    --chat-template chatml.jinja \
    --dtype auto \
    --api-key token-abc123 \
    --tensor-parallel-size 1 \
    --trust-remote-code

下面是一个调用搜索工具回答问题的演示:

function_call

代码解释器

我们提供了一个 MiniCPM3 使用代码解释器的示例代码:

cd demo/minicpm3/code_interpreter
pip install -r requirements.txt
python code_interpreter.py openbmb/MiniCPM3-4B

下面是一个使用代码解释器生成二维码的演示:

code_interpreter

MiniCPM 2.0

<details> <summary>查看 MiniCPM 2.0 的详细信息</summary>

MiniCPM 2.0 系列模型对 MiniCPM 进行了多个维度的升级,包括以下模型版本:

评测结果

MiniCPM-2B-128k 模型评测

Modelavgavg w/o code&mathpasskeynumber_stringkv_retrievallongbook_choice_englongbook_qa_chnlongbook_qa_englongbook_sum_englongdialogue_qa_engmath_calcmath_findcode_debugcode_run
LWM-Text-128k24.4533.6210097.80.628.8215.9314.319.991.503.4320.051
Yarn-Mistral-7b-128k19.8427.3692.71027.9515.499.559.067.5017.140.761.25
Mistral-7B-Instruct-v0.2(ABF 1000w)27.7536.910078.983.637.1211.7417.3721.129.5029.4317.510
Yi-6B-200k22.1532.5410094.92036.6815.079.20.923.504.290.510.75
chatglm3-6b-128k25.5836.5789.9399.665.246.2910.78.3825.916.5085.331
MiniCPM-2.4B-128k27.3237.6898.3199.83929.6923.0616.3315.739.504.2922.080

MiniCPM-MoE-8x2B 模型评测

<div align="left"> <table style="margin: 0px auto;"> <thead> <tr> <th align="left">Model</th> <th nowrap="nowrap" >BBH</th> <th nowrap="nowrap" >MMLU</th> <th nowrap="nowrap" >CEval</th> <th nowrap="nowrap" >CMMLU</th> <th nowrap="nowrap" >HumanEval</th> <th nowrap="nowrap" >MBPP&dagger;</th> <th nowrap="nowrap" >GSM8K</th> <th nowrap="nowrap" >MATH</th </tr> </thead> <tbody align="center"> <tr> <td nowrap="nowrap" align="left">Llama2-34B*</td> <td>44.1</td> <td>62.6</td> <td>-</td> <td>-</td> <td>22.6</td> <td>33.0</td> <td>42.2</td> <td>6.24</td> </tr> <tr> <td nowrap="nowrap" align="left">Mistral-7B-Instruct-v0.2</td> <td>39.81</td> <td>60.51</td> <td>42.55</td> <td>41.92</td> <td>36.59</td> <td>39.63</td> <td>40.49</td> <td>4.95</td> </tr> <tr> <td nowrap="nowrap" align="left" >Gemma-7B*</td> <td>55.1</td> <td>64.3</td> <td>-</td> <td>-</td> <td>32.3</td> <td>44.4</td> <td>46.4</td> <td>24.3</td> </tr> <tr> <td nowrap="nowrap" align="left" >Qwen1.5-7B*</td> <td>40.2</td> <td>61</td> <td>74.1</td> <td>73.1</td> <td>36</td> <td>37.4</td> <td>62.5</td> <td>20.3</td> </tr> <tr> <td nowrap="nowrap" align="left" >Deepseek-MoE(16B)*</td> <td>-</td> <td>45.0</td> <td>40.6</td> <td>42.5</td> <td>26.8</td> <td>39.2</td> <td>18.8</td> <td>4.3</td> </tr> <tr> <td nowrap="nowrap" align="left" ><b>MiniCPM-2.4B</b></td> <td>36.87</td> <td>53.46</td> <td>51.13</td> <td>51.07</td> <td>50.00</td> <td>35.93</td> <td>53.83</td> <td>10.24</td> </tr> <tr> <td nowrap="nowrap" align="left" ><b>MiniCPM-MoE-8x2B</b></td> <td>39.22</td> <td>58.90</td> <td>58.11</td> <td>58.80</td> <td>55.49</td> <td>41.68</td> <td>61.56</td> <td>10.52</td> </tr> </tbody> </table> </div>

注:* 表示结果取自技术报告。† 表示评测集为MBPP全集。

MiniCPM-S-1B 评测结果

其他测试集:我们报告在GSM8K(8-shot)、MMLU(5-shot)、BBH(3-shot)和 AGI-Eval(0-shot)上的平均准确率。

SettingAverage<br>SparsityAverage<br>PerformanceCode<br>GenerationCommonsense<br>ReasoningReading<br>ComprehensionGSM8KMMLUBBHAGI Eval
LLaMA2-7B-37.9616.3769.5961.8712.9644.4532.9627.53
ReluLLaMA-7B66.9837.6215.8569.6470.545.8438.6435.0727.73
ProSparse-7B*88.1138.3119.4766.2963.3312.7445.2133.5927.55
ProSparse-7B89.3238.4619.4266.2763.5012.1345.4834.9927.46
LLaMA2-13B-44.0620.1972.5871.5522.2154.6937.8929.33
ReluLLaMA-13B71.5642.7420.1970.4473.2918.5050.5837.9728.22
ProSparse-13B*87.9745.0729.0369.7567.5425.4054.7840.2028.76
ProSparse-13B88.8044.9028.4269.7666.9126.3154.3539.9028.67
MiniCPM-1B-44.4436.8563.6760.9035.4850.4435.0328.71
MiniCPM-S-1B*86.2544.7241.3864.5560.6934.7249.3634.0428.27
MiniCPM-S-1B87.8944.7242.0464.3760.7334.5749.5134.0827.77

注:

  1. ReluLLaMA-7B 和 ReluLLaMA-13B 的下载链接分别是 7B and 13B。"ProSparse-7B*"、"ProSparse-13B*" 和 "MiniCPM-S-1B*" 代表没有激活阈值偏移的 ProSparse 版本。
  2. 对于 PIQA、SIQA、HellaSwag、WinoGrande、COPA、BoolQ、LAMBADA、TyDi QA 和 AGI-Eval,我们根据各个选项的 PPL 来进行答案选择。对于 GSM8K、MMLU 和 BBH,我们直接生成答案。

模型推理

HuggingFace、vLLM推理

参考 MiniCPM 1.0 中的模型推理部分。

Powerinfer 推理

针对 MiniCPM-S-1B 模型,我们可以使用 Powerinfer 进行推理加速,使用方法如下:

  1. 保证cmake版本3.17以上,如果已经安装过,则跳过此步骤
  # 下载安装包
  sudo wget https://cmake.org/files/v3.23/cmake-3.23.0.tar.gz
  # 解压安装包
  sudo tar -zxvf cmake-3.23.0.tar.gz
  # 配置安装环境
  sudo ./configure
  sudo make -j8
  # 编译安装
  sudo make install
  # 查看安装后版本
  cmake --version
  # 返回版本号则安装成功
  #cmake version 3.23.0
  1. 安装powerinfer:
  git clone https://github.com/SJTU-IPADS/PowerInfer
  cd PowerInfer
  pip install -r requirements.txt # install Python helpers' dependencies
  1. cpu版本powerinfer编译,如果你的机器只有cpu,或者只想使用cpu进行推理,则运行以下命令:
  cmake -S . -B build
  cmake --build build --config Release
  1. gpu版本powerinfer编译,如果你的机器有gpu,则可以运行以下命令:
  cmake -S . -B build -DLLAMA_CUBLAS=ON
  cmake --build build --config Release
  1. 获取稀疏模型
git clone https://huggingface.co/openbmb/MiniCPM-S-1B-sft-gguf/tree/main
#or
git clone https://modelscope.cn/models/OpenBMB/MiniCPM-S-1B-sft-gguf
  1. 模型推理:
cd PowerInfer
# 以下是命令模版,output_token_count为最大输出tokens,thread_num 为线程数,prompt为输入prompt字符
#./build/bin/main -m /PATH/TO/MODEL -n $output_token_count -t $thread_num -p $prompt
# 以下是示例
./build/bin/main -m /root/ld/ld_model_pretrain/1b-s-minicpm/MiniCPM-S-1B-sft.gguf -n 2048 -t 8 -p '<用户>hello,tell me a story please.<AI>'
</details>

MiniCPM 1.0

<details> <summary>查看 MiniCPM 1.0 的详细信息</summary>

MiniCPM-2B 语言模型有 24亿(2.4B)的非词嵌入参数量, 总计 2.7B 参数量。

注意:为了保证在学术研究用途上模型的通用性,我们未对 MiniCPM-2B 进行任何身份认同训练。同时由于我们用 ShareGPT 开源语料作为部分训练数据,模型可能会输出类似 GPT 系列模型的身份认同信息。

评测结果

评测设置

部署模式

评测度量

文本模型评测

越级比较:

模型平均分英文均分中文均分C-EvalCMMLUMMLUHumanEvalMBPPGSM8KMATHBBHARC-EARC-CHellaSwag
Llama2-7B35.4036.2131.76532.4231.1144.3212.227.1713.571.833.2375.2542.7575.62*
Qwen-7B49.4647.1959.65558.9660.3557.6517.0742.1541.245.3437.7583.4264.7675.32*
Deepseek-7B39.9639.1543.6442.8244.4547.8220.1241.4515.851.5333.3874.58*42.15*75.45*
Mistral-7B48.9749.9644.5446.1242.9662.6927.4445.233.135.041.0683.9270.7380.43*
Llama2-13B41.4842.4437.1937.3237.0654.7117.0732.5521.152.2537.9278.87*58.1979.23*
MPT-30B38.1739.8230.7229.3432.0946.5621.9535.3610.311.5638.2278.66*46.08*79.72*
Falcon-40B43.6244.2140.9340.2941.5753.5324.3936.5322.441.9236.2481.94*57.6883.26*
MiniCPM-2B52.3352.651.151.1351.0753.4650.0047.3153.8310.2436.8785.4468.0068.25

同级比较:

模型平均分英文均分中文均分C-EvalCMMLUMMLUHumanEvalMBPPGSM8KMATHBBHARC-EARC-CHellaSwag
TinyLlama-1.1B25.3625.5524.52525.0224.0324.36.7119.912.270.7428.7860.77*28.15*58.33*
Qwen-1.8B34.7231.8747.5749.8145.3243.377.9317.8019.262.4229.0763.97*43.6959.28*
Gemini Nano-3B-------27.2(report)22.8(report)-42.4(report)---
StableLM-Zephyr-3B43.4646.3130.6230.3430.8945.935.3731.8552.5412.4937.6873.7855.3871.87*
Phi-2-2B48.8454.4123.7823.3724.1852.6647.5655.0457.163.543.3986.1171.2573.07*
MiniCPM-2B52.3352.651.1051.1351.0753.4650.0047.3153.8310.2436.8785.4468.0068.25

Chat模型比较:

模型平均分英文均分中文均分C-EvalCMMLUMMLUHumanEvalMBPPGSM8KMATHBBHARC-EARC-CHellaSwag
ChatGLM2-6B37.9835.1750.6352.0549.2145.7710.379.3822.745.9632.674.4556.8258.48*
Mistral-7B-Instruct-v0.144.3645.8937.5138.0636.9653.5629.2739.3428.733.4839.5281.6163.9973.47*
Mistral-7B-Instruct-v0.250.9152.8342.23542.5541.9260.5136.5948.9540.494.9539.8186.2873.3884.55*
Qwen-7B-Chat44.9342.0557.958.5757.2356.0315.8540.5242.238.337.3464.44*39.25*74.52*
Yi-6B-Chat50.4645.8970.99570.8871.1162.9514.0228.3436.543.8837.4384.8970.3974.6*
Baichuan2-7B-Chat44.6842.7453.3953.2853.55321.3432.3225.256.3237.4679.6360.1569.23*
Deepseek-7B-chat49.3449.5648.33546.9549.7251.6740.8548.4848.524.2635.776.8563.0576.68*
Llama2-7B-Chat38.1639.1733.5934.5432.6447.6414.0227.421.152.0835.5474.2854.7875.65*
MiniCPM-2B52.3352.651.1051.1351.0753.4650.0047.3153.8310.2436.8785.4468.0068.25

DPO后模型比较:

模型MT-bench
GPT-4-turbo9.32
GPT-3.5-turbo8.39
Mistral-8*7b-Instruct-v0.18.30
Claude-2.18.18
Zephyr-7B-beta7.34
MiniCPM-2B7.25
Vicuna-33B7.12
Zephyr-7B-alpha6.88
LLaMA-2-70B-chat6.86
Mistral-7B-Instruct-v0.16.84
MPT-34B-instruct6.39

快速上手

在线体验

基于Gradio的网页版Demo

# generation powered by vllm
python demo/minicpm/vllm_based_demo.py --model_path <vllmcpm_repo_path>
# generation powered by huggingface
python demo/minicpm/hf_based_demo.py --model_path <hf_repo_path>

HuggingFace 推理

MiniCPM-2B

安装transformers>=4.36.0以及accelerate后,运行以下代码:

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
torch.manual_seed(0)

path = 'openbmb/MiniCPM-2B-dpo-bf16'
tokenizer = AutoTokenizer.from_pretrained(path)
model = AutoModelForCausalLM.from_pretrained(path, torch_dtype=torch.bfloat16, device_map='cuda', trust_remote_code=True)

responds, history = model.chat(tokenizer, "山东省最高的山是哪座山, 它比黄山高还是矮?差距多少?", temperature=0.5, top_p=0.8, repetition_penalty=1.02)
print(responds)
MiniCPM-2B (Llama Format)

我们将MiniCPM的模型权重转化成了Llama代码可以直接调用的格式,以便大家尝试:

import torch
from transformers import LlamaTokenizerFast, LlamaForCausalLM
model_path = "openbmb/MiniCPM-2B-dpo-bf16-llama-format"
tokenizer = LlamaTokenizerFast.from_pretrained(model_path)
model = LlamaForCausalLM.from_pretrained(model_path, torch_dtype=torch.bfloat16, device_map='cuda', trust_remote_code=True)

prompt="Now you act like a terminal situated within a beginner's C++ practice repository folder, please provide the output for the command: `ls -l`"
input_ids = tokenizer.encode("<用户>{}<AI>".format(prompt), return_tensors='pt', add_special_tokens=True).cuda()
responds = model.generate(input_ids, temperature=0.3, top_p=0.8, repetition_penalty=1.02, max_length=1024)
responds = tokenizer.decode(responds[0], skip_special_tokens=True)
print(responds)

vLLM 推理

安装 vLLM

pip install "vllm>=0.4.1"

具体推理代码见这里

SGLang 推理

安装 SGLang

python -m sglang.launch_server --model-path openbmb/MiniCPM-2B-dpo-fp16 --trust-remote-code --port 30000
from sglang import function, gen, set_default_backend, RuntimeEndpoint

@function
def text_qa(s, question):
    s += "<用户>" + question + "<AI>"
    s += gen("answer", max_tokens=1024, temperature=0.7, top_p=0.7)

set_default_backend(RuntimeEndpoint("http://localhost:30000"))

state = text_qa.run(
    question="What is the capital of China?",
)

print(state["answer"])

llama.cpp、Ollama、fastllm、mlx_lm推理

MiniCPM支持llama.cppollamafastllmmlx_lm推理。感谢@runfuture对llama.cpp和ollama的适配。

请参考 MiniCPM 知识库中的边端部署教程

模型量化

请参考 MiniCPM 知识库中的量化指南

模型微调

</details>

开源协议

模型协议

声明

开发机构

本项目由以下机构共同开发:

工作引用

@article{hu2024minicpm,
  title={MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies},
  author={Hu, Shengding and Tu, Yuge and Han, Xu and He, Chaoqun and Cui, Ganqu and Long, Xiang and Zheng, Zhi and Fang, Yewei and Huang, Yuxiang and Zhao, Weilin and others},
  journal={arXiv preprint arXiv:2404.06395},
  year={2024}
}