Home

Awesome

<div align="center"> <h1 align="center">The default model repository of <a href="https://github.com/bentoml/openllm">openllm</a></h1> </div>

This repo (on main branch) is already included by openllm by default.

If you want more up-to-date untested models, please add our nightly branch.

openllm repo add nightly https://github.com/bentoml/openllm-models@nightly

Supported Models

<table style="width: 100%; border-collapse: collapse;"> <tr> <td style="background-color: #D1D5DA; padding: 10px; border-radius: 8px 8px 0 0; width: 100%;"> <span style="color: red;">●</span> <span style="color: yellow;">●</span> <span style="color: green;">●</span> </td> </tr> <tr> <td>
$ openllm repo update
$ openllm model list
model          version                                     repo     required GPU RAM    platforms
-------------  ------------------------------------------  -------  ------------------  -----------
codestral      codestral:22b-v0.1-fp16-2638                default  80G                 linux
gemma          gemma:2b-instruct-fp16-d4c6                 default  12G                 linux
               gemma:7b-instruct-fp16-b15c                 default  24G                 linux
               gemma:7b-instruct-awq-4bit-464a             default  12G                 linux
gemma2         gemma2:9b-instruct-fp16-b9d3                default  24G                 linux
               gemma2:27b-instruct-fp16-9f48               default  80G                 linux
jamba1.5       jamba1.5:mini-fp16-9c32                     default  80Gx4               linux
llama2         llama2:7b-chat-fp16-3523                    default  16G                 linux
               llama2:7b-chat-awq-4bit-a536                default  12G                 linux
               llama2:13b-chat-fp16-b2a0                   default  40G                 linux
               llama2:70b-chat-fp16-7fe8                   default  80Gx2               linux
llama3         llama3:8b-instruct-fp16-07fd                default  24G                 linux
               llama3:8b-instruct-awq-4bit-da0a            default  12G                 linux
               llama3:70b-instruct-fp16-4863               default  80Gx2               linux
               llama3:70b-instruct-awq-4bit-c17d           default  80G                 linux
llama3.1       llama3.1:8b-instruct-fp16-d75d              default  24G                 linux
               llama3.1:8b-instruct-awq-4bit-96eb          default  12G                 linux
               llama3.1:70b-instruct-fp16-b86d             default  80Gx2               linux
               llama3.1:70b-instruct-awq-4bit-dbcc         default  80G                 linux
               llama3.1:405b-instruct-awq-4bit-2358        default  80Gx4               linux
llama3.2       llama3.2:1b-instruct-fp16-62c6              default  12G                 linux
               llama3.2:1b-instruct-ggml-fp16-linux-60fa   default                      linux
               llama3.2:1b-instruct-ggml-fp16-darwin-8d35  default                      macos
               llama3.2:3b-instruct-fp16-53eb              default  12G                 linux
               llama3.2:11b-vision-instruct-8926           default  80G                 linux
mistral        mistral:7b-instruct-fp16-6aea               default  24G                 linux
               mistral:7b-instruct-awq-4bit-1bf8           default  12G                 linux
               mistral:24b-instruct-nemo-9adb              default  80G                 linux
mistral-large  mistral-large:123b-instruct-fp16-9008       default  80Gx4               linux
               mistral-large:123b-instruct-awq-4bit-377e   default  80G                 linux
mixtral        mixtral:8x7b-instruct-v0.1-fp16-b8de        default  80Gx2               linux
               mixtral:8x7b-instruct-v0.1-awq-4bit-1392    default  40G                 linux
phi3           phi3:3.8b-instruct-fp16-ac3c                default  12G                 linux
               phi3:3.8b-instruct-ggml-q4-463e             default                      macos
pixtral        pixtral:12b-240910-82ad                     default  80G                 linux
qwen2          qwen2:0.5b-instruct-fp16-fe8a               default  12G                 linux
               qwen2:1.5b-instruct-fp16-74cd               default  12G                 linux
               qwen2:7b-instruct-fp16-e8fa                 default  24G                 linux
               qwen2:7b-instruct-awq-4bit-dc8b             default  12G                 linux
               qwen2:57b-a14b-instruct-fp16-55df           default  80Gx2               linux
               qwen2:72b-instruct-fp16-5ba6                default  80Gx2               linux
               qwen2:72b-instruct-awq-4bit-2907            default  80G                 linux
qwen2.5        qwen2.5:0.5b-instruct-fp16-d59e             default  12G                 linux
               qwen2.5:1.5b-instruct-fp16-d1e9             default  12G                 linux
               qwen2.5:3b-instruct-fp16-b569               default  12G                 linux
               qwen2.5:7b-instruct-fp16-60be               default  24G                 linux
               qwen2.5:14b-instruct-fp16-9539              default  80G                 linux
               qwen2.5:14b-instruct-ggml-q4-darwin-1cf2    default                      macos
               qwen2.5:14b-instruct-ggml-q8-darwin-f06a    default                      macos
               qwen2.5:32b-instruct-fp16-7b67              default  80G                 linux
               qwen2.5:32b-instruct-awq-4bit-6958          default  40G                 linux
               qwen2.5:32b-instruct-ggml-fp16-darwin-809c  default                      macos
               qwen2.5:72b-instruct-fp16-aa9c              default  80Gx2               linux
               qwen2.5:72b-instruct-ggml-q4-darwin-a138    default                      macos
qwen2vl        qwen2vl:7b-instruct-fp16-8a12               default  24G                 linux

</td> </tr> </table>

Development Guide

Open PRs to the nightly branch to add new models or update existing models.

You can also fork this repo and add your own models.

Use openllm repo add to use your own model repository.