Home

Awesome

🏰 LLM Zoo

As new animal species are being discovered in the world of natural language processing (NLP) 🌍 every day, it becomes necessary to establish a zoo 🦁 to accommodate them.

This project collects below information of various open- and closed-source LLMs (after the release of ChatGPT):

📰 News

📖 Open-Sourced LLMs

Release TimeModelVersionSizeBackboneLangsDomainTraining DataGitHubHFPaperDemoOfficial Blog
2023.02.27LLaMAllama-7b/13b/33b/65b7B/13B/33B/65B-enGeneral<details><summary><b>detail</b></summary>1T tokens (English CommonCrawl, C4, Github, Wikipedia, Gutenberg and Books3, ArXiv, Stack Exchange)</details>[link][link][link]-[link]
2023.03.13Alpacaalpaca-7b/13b7B/13BLLaMAenGeneral<details><summary><b>detail</b></summary>52k instruction-following data generated by InstructGPT [link]</details>[link][link]-[link][link]
2023.03.13Vicunavicuna-7b/13b-delta-v1.17B/13BLLaMAenGeneral<details><summary><b>detail</b></summary>70K samples from sharedGPT</details>[link][link]-[link][link]
2023.03.14ChatGLMchatglm-6b6BGLMzh, enGeneral<details><summary><b>detail</b></summary>supervised fine-tuning, feedback bootstrap, and reinforcement learning with human feedback</details>[link][link]--[link]
2023.03.14ChatGLMchatglm-130b130BGLMzh, enGeneral<details><summary><b>detail</b></summary>supervised fine-tuning, feedback bootstrap, and reinforcement learning with human feedback</details>[link]-[link][link][link]
2023.03.16Guanaco-7BLLaMAja, zh, en, deGeneral<details><summary><b>detail</b></summary>multilingual datasets [link]</details>[link][link]---
2023.03.24Dollydolly-v1-6b6BGPT-J-6BenGeneral<details><summary><b>detail</b></summary>52k stanford alpaca instruction-following data [link]</details>-[link]--[link]
2023.03.24ChatDoctor-7BLLaMAenMedicine<details><summary><b>detail</b></summary>52K stanford alpaca [link], 100K HealthCareMagic [link], 10K icliniq [link], 5K GenMedGPT-5k [link]</details>[link]-[link][link]-
2023.03.25LuoTuoChinese-alpaca-lora7BLLaMAzh, enGeneral<details><summary><b>detail</b></summary>Translated 52k stanford alpaca instruction-following data [link], guanaco [link]</details>[link][link]---
2023.03.26BELLEBELLE-7B-0.2M/0.6M/1M/2M7BBLOOMZ-7B1-mtzh, enGeneral<details><summary><b>detail</b></summary>0.2M/0.6M/1M/2M Chinese data [link], 52k stanford alpaca instruction-following data [link]</details>[link][link][link]--
2023.03.28Linly (伶荔)Linly-Chinese-LLaMA 7b/13b/33b7B/13B/33BLLaMAzhGeneral<details><summary><b>detail</b></summary>Chinese-English parallel corpora [link], Chinese Wikipedia, community interaction, news data [link], scientific literature [link]</details>[link][link]---
2023.03.28Linly (伶荔)Linly-ChatFlow 7b/13b7B/13BLLaMAzhGeneral<details><summary><b>detail</b></summary>BELLE [link], pCLUE [link], CSL [link], GuanacoDataset [link], Chain-of-Thought [link], news_commentary [link], firefly [link]</details>[link][link]--[link]
2023.04.01BAIZEbaize-7B/13B/30B7B/13B/30BLLaMAenGeneral<details><summary><b>detail</b></summary>52K Stanford Alpaca [link], 54K Quora [link], 57K StackOverFlow [link]</details>[link][link][link][link]-
2023.04.03Koala-13BLLaMAenGeneral<details><summary><b>detail</b></summary>ShareGPT, HC3 [link], OIG [link], Stanford alpaca [link], Anthropic HH [link], OpenAI WebGPT [link], OpenAI Summarization [link]</details>-[link]-[link][link]
2023.04.03BAIZEbaize-healthcare-7b7BLLaMAenMedicine<details><summary><b>detail</b></summary>54K Quora [link], 47K medical dialogs [link]</details>[link][link]---
2023.04.06Firefly (流萤)firefly-1b4/2b61.4B/2.6BBLOOM-ZHzhGeneral<details><summary><b>detail</b></summary>Chinese question-answering pairs [link], [link]</details>[link][link]---
2023.04.08PhoenixPhoenix-chat-7b7BBLOOMZmultiGeneral<details><summary><b>detail</b></summary>conversation data [link]</details>[link][link]---
2023.04.09PhoenixPhoenix-inst-chat-7b7BBLOOMZmultiGeneral<details><summary><b>detail</b></summary>conversation data [link], instruction data</details>[link][link]---
2023.04.10Chimerachimera-chat-7b/13b7B/13BLLaMAlatinGeneral<details><summary><b>detail</b></summary>conversation data [link]</details>[link][link]---
2023.04.11Chimerachimera-inst-chat-7b/13b7B/13BLLaMAlatinGeneral<details><summary><b>detail</b></summary>conversation data [link], instruction data</details>[link][link]---
2023.04.12Dollydolly-v2-12b12Bpythia-12benGeneral<details><summary><b>detail</b></summary>15k human-generated prompt/response pairs [link]</details>[link][link]--[link]
2023.04.14MedAlpacamedalpaca 7b/13b7B/13BLLaMAenMedicine<details><summary><b>detail</b></summary>question-answering pairs from flash card, wikidoc, stackexchange and ChatDoctor</details>[link][link][link]--
2023.04.19BELLEBELLE-LLaMA-7B/13B-2M7B/13BLLaMAzh, enGeneral<details><summary><b>detail</b></summary>2M Chinese data [link], 52k stanford alpaca instruction-following data [link]</details>[link][link][link]--
2023.04.21MOSSmoss-moon-003-base16BCodeGenzh, enGeneral<details><summary><b>detail</b></summary>100B Chinese tokens and 20B English tokens</details>[link][link]-[link][link]
2023.04.21MOSSmoss-moon-003-sft16Bmoss-moon-003-basezh, enGeneral<details><summary><b>detail</b></summary>1.1M multi-turn conversational data (generated from ChatGPT) [link]</details>[link][link]-[link][link]
2023.04.21MOSSmoss-moon-003-sft-plugin16Bmoss-moon-003-basezh, enGeneral<details><summary><b>detail</b></summary>1.1M multi-turn conversational data [link], 300K plugin-augmented data (generated by InstructGPT) [link]</details>[link][link]-[link][link]
2023.04.22HuggingChatoasst-sft-6-llama-30b30BLLaMAmultiGeneral<details><summary><b>detail</b></summary>human-generated, human-annotated assistant-style conversation corpus consisting of 161k messages in 35 languages [link]</details>[link][link]-[link]-
2023.06.19KnowLMzhixi-13b13BLLaMAzh, enGeneral<details><summary><b>detail</b></summary>human-generated, machine-generated and Knowledge Graph-generated in Chinese and English [link]</details>[link][link]---
2023.06.21BayLing(百聆)BayLing-7b/13b7B/13BLLaMAzh, enGeneral<details><summary><b>detail</b></summary>160K human-generated, machine-generated multi-turn interactive translation corpus, alpaca instructions and sharegpt conversations [link]</details>[link][link][link][link][link]
2023.07.18LLaMA 2llama-2-7b/13b/70b-(chat)7B/13B/70B-enGeneral<details><summary><b>detail</b></summary>2T tokens (Most in English, a new mix of data from publicly available sources)</details>[link][link][link]-[link]

📕 Closed-Sourced LLMs

Release TimeModelVersionSizeLangsDomainDemoOfficial BlogPaper
2022.11.30ChatGPTgpt-3.5-turbo-multigeneral[link][link]-
2023.03.14ClaudeClaude Instant</br>Claude-v1-multigeneral[link][link]-
2023.03.14GPTgpt-4-multigeneral[link][link][link]
2023.03.16Ernie Bot (文心一言)--zh, engeneral[link][link]-
2023.03.21Bard--multigeneral[link][link]-
2023.03.30BloombergGPT-50Benfinance-[link][link]
2023.04.11Tongyi Qianwen (通义千问)--multigeneral[link][link]-
2023.07.07OmModel(欧姆大模型)--multigeneral[link][link]-
2023.07.11Claude 2Claude-v2-multigeneral-[link][link]

🏗 TODO List

📝 Citation

If you find this repository useful, please consider citing.

@software{li2023llmzoo,
  title = {LLM Zoo}
  author = {Li, Xingxuan and Zhang, Wenxuan and Bing, Lidong},
  url = {https://github.com/DAMO-NLP-SG/LLM-Zoo},
  year = {2023}
}