T5 | 2019/10 | T5 & Flan-T5, Flan-T5-xxl (HF) | Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer | 0.06 - 11 | 512 | Apache 2.0 | T5-Large |
UL2 | 2022/10 | UL2 & Flan-UL2, Flan-UL2 (HF) | UL2 20B: An Open Source Unified Language Learner | 20 | 512, 2048 | Apache 2.0 | |
Cohere | 2022/06 | Checkpoint | Code | 54 | 4096 | Model | Website |
Cerebras-GPT | 2023/03 | Cerebras-GPT | Cerebras-GPT: A Family of Open, Compute-efficient, Large Language Models (Paper) | 0.111 - 13 | 2048 | Apache 2.0 | Cerebras-GPT-1.3B |
Open Assistant (Pythia family) | 2023/03 | OA-Pythia-12B-SFT-8, OA-Pythia-12B-SFT-4, OA-Pythia-12B-SFT-1 | Democratizing Large Language Model Alignment | 12 | 2048 | Apache 2.0 | Pythia-2.8B |
Pythia | 2023/04 | pythia 70M - 12B | Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling | 0.07 - 12 | 2048 | Apache 2.0 | |
Dolly | 2023/04 | dolly-v2-12b | Free Dolly: Introducing the World's First Truly Open Instruction-Tuned LLM | 3, 7, 12 | 2048 | MIT | |
DLite | 2023/05 | dlite-v2-1_5b | Announcing DLite V2: Lightweight, Open LLMs That Can Run Anywhere | 0.124 - 1.5 | 1024 | Apache 2.0 | DLite-v2-1.5B |
RWKV | 2021/08 | RWKV, ChatRWKV | The RWKV Language Model (and my LM tricks) | 0.1 - 14 | infinity (RNN) | Apache 2.0 | |
GPT-J-6B | 2023/06 | GPT-J-6B, GPT4All-J | GPT-J-6B: 6B JAX-Based Transformer | 6 | 2048 | Apache 2.0 | |
GPT-NeoX-20B | 2022/04 | GPT-NEOX-20B | GPT-NeoX-20B: An Open-Source Autoregressive Language Model | 20 | 2048 | Apache 2.0 | |
Bloom | 2022/11 | Bloom | BLOOM: A 176B-Parameter Open-Access Multilingual Language Model | 176 | 2048 | OpenRAIL-M v1 | |
StableLM-Alpha | 2023/04 | StableLM-Alpha | Stability AI Launches the First of its StableLM Suite of Language Models | 3 - 65 | 4096 | CC BY-SA-4.0 | |
FastChat-T5 | 2023/04 | fastchat-t5-3b-v1.0 | We are excited to release FastChat-T5: our compact and commercial-friendly chatbot! | 3 | 512 | Apache 2.0 | |
h2oGPT | 2023/05 | h2oGPT | Building the World’s Best Open-Source Large Language Model: H2O.ai’s Journey | 12 - 20 | 256 - 2048 | Apache 2.0 | |
MPT-7B | 2023/05 | MPT-7B, MPT-7B-Instruct | Introducing MPT-7B: A New Standard for Open-Source, Commercially Usable LLMs | 7 | 84k (ALiBi) | Apache 2.0, CC BY-SA-3.0 | |
PanGU-Σ | 2023/3 | PanGU | Model | 1085 | - | Model | Page |
RedPajama-INCITE | 2023/05 | RedPajama-INCITE | Releasing 3B and 7B RedPajama-INCITE family of models including base, instruction-tuned & chat models | 3 - 7 | 2048 | Apache 2.0 | RedPajama-INCITE-Instruct-3B-v1 |
OpenLLaMA | 2023/05 | open_llama_3b, open_llama_7b, open_llama_13b | OpenLLaMA: An Open Reproduction of LLaMA | 3, 7 | 2048 | Apache 2.0 | OpenLLaMA-7B-Preview_200bt |
Falcon | 2023/05 | Falcon-180B, Falcon-40B, Falcon-7B | The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only | 180, 40, 7 | 2048 | Apache 2.0 | |
MPT-30B | 2023/06 | MPT-30B, MPT-30B-instruct | MPT-30B: Raising the bar for open-source foundation models | 30 | 8192 | Apache 2.0, CC BY-SA-3.0 | MPT 30B inference code using CPU |
LLaMA 2 | 2023/06 | LLaMA 2 Weights | Llama 2: Open Foundation and Fine-Tuned Chat Models | 7 - 70 | 4096 | Custom Free if you have under 700M users and you cannot use LLaMA outputs to train other LLMs besides LLaMA and its derivatives | HuggingChat |
OpenLM | 2023/09 | OpenLM 1B, OpenLM 7B | Open LM: a minimal but performative language modeling (LM) repository | 1, 7 | 2048 | MIT | |
Mistral 7B | 2023/09 | Mistral-7B-v0.1, Mistral-7B-Instruct-v0.1 | Mistral 7B | 7 | 4096-16K with Sliding Windows | Apache 2.0 | Mistral Transformer |
OpenHermes | 2023/09 | OpenHermes-7B, OpenHermes-13B | Nous Research | 7, 13 | 4096 | MIT | OpenHermes-V2 Finetuned on Mistral 7B |
SOLAR | 2023/12 | Solar-10.7B | Upstage | 10.7 | 4096 | apache-2.0 | |
phi-2 | 2023/12 | phi-2 2.7B | Microsoft | 2.7 | 2048 | MIT | |
SantaCoder | 2023/01 | santacoder | SantaCoder: don't reach for the stars! | 1.1 | 2048 | OpenRAIL-M v1 | SantaCoder |
StarCoder | 2023/05 | starcoder | StarCoder: A State-of-the-Art LLM for Code, StarCoder: May the source be with you! | 1.1-15 | 8192 | OpenRAIL-M v1 | |
StarChat Alpha | 2023/05 | starchat-alpha | Creating a Coding Assistant with StarCoder | 16 | 8192 | OpenRAIL-M v1 | |
Replit Code | 2023/05 | replit-code-v1-3b | Training a SOTA Code LLM in 1 week and Quantifying the Vibes — with Reza Shabani of Replit | 2.7 | infinity? (ALiBi) | CC BY-SA-4.0 | Replit-Code-v1-3B |
CodeGen2 | 2023/04 | codegen2 1B-16B | CodeGen2: Lessons for Training LLMs on Programming and Natural Languages | 1 - 16 | 2048 | Apache 2.0 | |
CodeT5+ | 2023/05 | CodeT5+ | CodeT5+: Open Code Large Language Models for Code Understanding and Generation | 0.22 - 16 | 512 | BSD-3-Clause | Codet5+-6B |
XGen-7B | 2023/06 | XGen-7B-8K-Base | Long Sequence Modeling with XGen: A 7B LLM Trained on 8K Input Sequence Length | 7 | 8192 | Apache 2.0 | |
CodeGen2.5 | 2023/07 | CodeGen2.5-7B-multi | CodeGen2.5: Small, but mighty | 7 | 2048 | Apache 2.0 | |
DeciCoder-1B | 2023/08 | DeciCoder-1B | Introducing DeciCoder: The New Gold Standard in Efficient and Accurate Code Generation | 1.1 | 2048 | Apache 2.0 | DeciCoder Demo |
Code Llama | 2023 | Inference Code for CodeLlama models | Code Llama: Open Foundation Models for Code | 7 - 34 | 4096 | Model | HuggingChat |
Sparrow | 2022/09 | Inference Code | Code | 70 | 4096 | Model | Webpage |
Mistral | 2023/09 | Inference Code | Code | 7 | 8000 | Model | Webpage |
Koala | 2023/04 | Inference Code | Code | 13 | 4096 | Model | Webpage |
PaLM 2 | 2024 | N/A | Google AI | 540 | N/A | N/A | N/A |
Tongyi Qianwen | 2024 | N/A | Alibaba Cloud | N/A | N/A | N/A | N/A |
Cohere Command | 2024 | N/A | Cohere | 6 - 52 | N/A | N/A | N/A |
Vicuna 33B | 2024 | N/A | Meta AI | 33 | N/A | N/A | N/A |
Guanaco-65B | 2024 | N/A | Meta AI | 65 | N/A | N/A | N/A |
Amazon Q | 2024 | N/A | AWS | N/A | N/A | N/A | N/A |
Falcon 180B | 2024 | Falcon-180B | Technology Innovation Institute | 180 | N/A | Apache 2.0 | N/A |
YI 34B | 2024 | N/A | 01 AI | 34 | Up to 32K | N/A | N/A |
Mixtral 8x7B | 2023 | Mixtral 8X 7B | Mistral AI | 46.7 (12.9 per token) | N/A | Apache 2.0 | N/A |