Home

Awesome

Open LLMs

These LLMs (Large Language Models) are all licensed for commercial use (e.g., Apache 2.0, MIT, OpenRAIL-M). Contributions welcome!

Language ModelRelease DateCheckpointsPaper/BlogParams (B)Context LengthLicenceTry it
T52019/10T5 & Flan-T5, Flan-T5-xxl (HF)Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer0.06 - 11512Apache 2.0T5-Large
RWKV 42021/08RWKV, ChatRWKVThe RWKV Language Model (and my LM tricks)0.1 - 14infinity (RNN)Apache 2.0
GPT-NeoX-20B2022/04GPT-NEOX-20BGPT-NeoX-20B: An Open-Source Autoregressive Language Model202048Apache 2.0
YaLM-100B2022/06yalm-100bYandex publishes YaLM 100B, the largest GPT-like neural network in open source1001024Apache 2.0
UL22022/10UL2 & Flan-UL2, Flan-UL2 (HF)UL2 20B: An Open Source Unified Language Learner20512, 2048Apache 2.0
Bloom2022/11BloomBLOOM: A 176B-Parameter Open-Access Multilingual Language Model1762048OpenRAIL-M v1
ChatGLM2023/03<!--13-->chatglm-6bChatGLM, Github62048Custom Free with some usage restriction (might require registration)
Cerebras-GPT2023/03Cerebras-GPTCerebras-GPT: A Family of Open, Compute-efficient, Large Language Models (Paper)0.111 - 132048Apache 2.0Cerebras-GPT-1.3B
Open Assistant (Pythia family)2023/03OA-Pythia-12B-SFT-8, OA-Pythia-12B-SFT-4, OA-Pythia-12B-SFT-1Democratizing Large Language Model Alignment122048Apache 2.0Pythia-2.8B
Pythia2023/04pythia 70M - 12BPythia: A Suite for Analyzing Large Language Models Across Training and Scaling0.07 - 122048Apache 2.0
Dolly2023/04dolly-v2-12bFree Dolly: Introducing the World's First Truly Open Instruction-Tuned LLM3, 7, 122048MIT
StableLM-Alpha2023/04StableLM-AlphaStability AI Launches the First of its StableLM Suite of Language Models3 - 654096CC BY-SA-4.0
FastChat-T52023/04fastchat-t5-3b-v1.0We are excited to release FastChat-T5: our compact and commercial-friendly chatbot!3512Apache 2.0
DLite2023/05dlite-v2-1_5bAnnouncing DLite V2: Lightweight, Open LLMs That Can Run Anywhere0.124 - 1.51024Apache 2.0DLite-v2-1.5B
h2oGPT2023/05h2oGPTBuilding the World’s Best Open-Source Large Language Model: H2O.ai’s Journey12 - 20256 - 2048Apache 2.0
MPT-7B2023/05MPT-7B, MPT-7B-InstructIntroducing MPT-7B: A New Standard for Open-Source, Commercially Usable LLMs784k (ALiBi)Apache 2.0, CC BY-SA-3.0
RedPajama-INCITE2023/05RedPajama-INCITEReleasing 3B and 7B RedPajama-INCITE family of models including base, instruction-tuned & chat models3 - 72048Apache 2.0RedPajama-INCITE-Instruct-3B-v1
OpenLLaMA2023/05open_llama_3b, open_llama_7b, open_llama_13bOpenLLaMA: An Open Reproduction of LLaMA3, 72048Apache 2.0OpenLLaMA-7B-Preview_200bt
Falcon2023/05Falcon-180B, Falcon-40B, Falcon-7BThe RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only180, 40, 72048Apache 2.0
GPT-J-6B2023/06GPT-J-6B, GPT4All-JGPT-J-6B: 6B JAX-Based Transformer62048Apache 2.0
MPT-30B2023/06MPT-30B, MPT-30B-instructMPT-30B: Raising the bar for open-source foundation models308192Apache 2.0, CC BY-SA-3.0MPT 30B inference code using CPU
LLaMA 22023/06<!--18-->LLaMA 2 Weights Llama 2: Open Foundation and Fine-Tuned Chat Models7 - 704096Custom Free if you have under 700M users and you cannot use LLaMA outputs to train other LLMs besides LLaMA and its derivativesHuggingChat
ChatGLM22023/06<!--25-->chatglm2-6bChatGLM2-6B, Github632kCustom Free with some usage restriction (might require registration)
XGen-7B2023/06<!--28-->xgen-7b-4k-base, xgen-7b-8k-baseLong Sequence Modeling with XGen74096, 8192Apache 2.0
Jais-13b2023/08<!--17-->jais-13b, jais-13b-chatJais and Jais-chat: Arabic-Centric Foundation and Instruction-Tuned Open Generative Large Language Models132048Apache 2.0
OpenHermes2023/09<!--14-->OpenHermes-7B, OpenHermes-13BNous Research7, 134096MITOpenHermes-V2 Finetuned on Mistral 7B
OpenLM2023/09<!--26-->OpenLM 1B, OpenLM 7B Open LM: a minimal but performative language modeling (LM) repository1, 72048MIT
Mistral 7B2023/09<!--27-->Mistral-7B-v0.1, Mistral-7B-Instruct-v0.1Mistral 7B74096-16K with Sliding WindowsApache 2.0Mistral Transformer
ChatGLM32023/10<!--27-->chatglm3-6b, chatglm3-6b-base, chatglm3-6b-32k, chatglm3-6b-128kChatGLM368192, 32k, 128kCustom Free with some usage restriction (might require registration)
Skywork2023/10<!--30-->Skywork-13B-Base, Skywork-13B-MathSkywork134096Custom Free with usage restriction and models trained on Skywork outputs become Skywork derivatives, subject to this license.
Jais-30b2023/11<!--08-->jais-30b-v1, jais-30b-chat-v1Jais-30B: Expanding the Horizon in Open-Source Arabic NLP302048Apache 2.0
Zephyr2023/11<!--10-->Zephyr 7BWebsite78192Apache 2.0
DeepSeek2023/11<!--30-->deepseek-llm-7b-base, deepseek-llm-7b-chat, deepseek-llm-67b-base, deepseek-llm-67b-chatIntroducing DeepSeek LLM,7, 674096Custom Free with usage restriction and models trained on DeepSeek outputs become DeepSeek derivatives, subject to this license.
Mistral 7B v0.22023/12<!--11-->Mistral-7B-v0.2, Mistral-7B-Instruct-v0.2La Plateforme732kApache 2.0
Mixtral 8x7B v0.12023/12<!--11-->Mixtral-8x7B-v0.1, Mixtral-8x7B-Instruct-v0.1Mixtral of experts46.732kApache 2.0
LLM360 Amber2023/12<!--11-->Amber, AmberChat, AmberSafeIntroducing LLM360: Fully Transparent Open-Source LLMs6.72048Apache 2.0
SOLAR2023/12<!--12-->Solar-10.7BUpstage10.74096apache-2.0
phi-22023/12<!--12-->phi-2 2.7BMicrosoft2.72048MIT
FLOR2023/12<!--22-->FLOR-760M, FLOR-1.3B, FLOR-1.3B-Instructed, FLOR-6.3B, FLOR-6.3B-InstructedFLOR-6.3B: a chinchilla-compliant model for Catalan, Spanish and English0.76, 1.3, 6.32048Apache 2.0 with usage restriction inherited from BLOOM
RWKV 5 v22024/01<!--28-->rwkv-5-world-0.4b-2, rwkv-5-world-1.5b-2, rwkv-5-world-3b-2, rwkv-5-world-3b-2(16k), rwkv-5-world-7b-2RWKV 50.4, 1.5, 3, 7unlimited(RNN), trained on 4096 (and 16k for 3b)Apache 2.0
OLMo2024/02<!--01-->OLMo 1B, OLMo 7B, OLMo 7B Twin 2TAI21,72048Apache 2.0
Qwen1.52024/02<!--04-->Qwen1.5-7B, Qwen1.5-7B-Chat, Qwen1.5-14B, Qwen1.5-14B-Chat, Qwen1.5-72B, Qwen1.5-72B-ChatIntroducing Qwen1.57, 14, 7232kCustom Free if you have under 100M users and you cannot use Qwen outputs to train other LLMs besides Qwen and its derivatives
LWM2024/02<!--07-->LWM-Text-Chat-128K, LWM-Text-Chat-256K, LWM-Text-Chat-512K, LWM-Text-Chat-1M, LWM-Text-128K, LWM-Text-256K, LWM-Text-512K, LWM-Text-1MLarge World Model (LWM)7128k, 256k, 512k, 1MLLaMA 2 license
Jais-30b v32024/03<!--08-->jais-30b-v3, jais-30b-chat-v3Jais 30b v3308192Apache 2.0
Gemma2024/02<!--21-->Gemma 7B, Gemma 7B it, Gemma 2B, Gemma 2B itTechnical report2-78192Gemma Terms of Use Free with usage restriction and models trained on Gemma outputs become Gemma derivatives, subject to this license.
Grok-12024/03<!--17-->Grok-1Open Release of Grok-13148192Apache 2.0
Qwen1.5 MoE2024/03<!--28-->Qwen1.5-MoE-A2.7B, Qwen1.5-MoE-A2.7B-ChatQwen1.5-MoE: Matching 7B Model Performance with 1/3 Activated Parameters14.38192Custom Free if you have under 100M users and you cannot use Qwen outputs to train other LLMs besides Qwen and its derivatives
Jamba 0.12024/03<!--28-->Jamba-v0.1Introducing Jamba: AI21's Groundbreaking SSM-Transformer Model52256kApache 2.0
Qwen1.5 32B2024/04<!--02-->Qwen1.5-32B, Qwen1.5-32B-ChatQwen1.5-32B: Fitting the Capstone of the Qwen1.5 Language Model Series3232kCustom Free if you have under 100M users and you cannot use Qwen outputs to train other LLMs besides Qwen and its derivatives
Mamba-7B2024/04<!--15-->mamba-7b-rwToyota Research Institute7unlimited(RNN), trained on 2048Apache 2.0
Mixtral8x22B v0.12024/04<!--17-->Mixtral-8x22B-v0.1, Mixtral-8x22B-Instruct-v0.1Cheaper, Better, Faster, Stronger14164kApache 2.0
Llama 32024/04<!--18-->Llama-3-8B, Llama-3-8B-Instruct, Llama-3-70B, Llama-3-70B-Instruct, Llama-Guard-2-8BIntroducing Meta Llama 3, Meta Llama 38, 708192Meta Llama 3 Community License Agreement Free if you have under 700M users and you cannot use LLaMA 3 outputs to train other LLMs besides LLaMA 3 and its derivatives
Phi-3 Mini2024/04<!--23-->Phi-3-mini-4k-instruct, Phi-3-mini-128k-instructIntroducing Phi-3, Technical Report3.84096, 128kMIT
OpenELM2024/04<!--24-->OpenELM-270M, OpenELM-270M-Instruct, OpenELM-450M, OpenELM-450M-Instruct, OpenELM-1_1B, OpenELM-1_1B-Instruct, OpenELM-3B, OpenELM-3B-InstructOpenELM: An Efficient Language Model Family with Open Training and Inference Framework0.27, 0.45, 1.1, 32048Custom open license No usage or training restrictions
Snowflake Arctic2024/04<!--24-->snowflake-arctic-base, snowflake-arctic-instructSnowflake Arctic: The Best LLM for Enterprise AI — Efficiently Intelligent, Truly Open4804096Apache 2.0
Qwen1.5 110B2024/04<!--25-->Qwen1.5-110B, Qwen1.5-110B-ChatQwen1.5-110B: The First 100B+ Model of the Qwen1.5 Series11032kCustom Free if you have under 100M users and you cannot use Qwen outputs to train other LLMs besides Qwen and its derivatives
RWKV 6 v2.12024/05<!--06-->rwkv-6-world-1.6b-2.1, rwkv-6-world-3b-2.1, rwkv-6-world-7b-2.1RWKV 61.6, 3, 7unlimited(RNN), trained on 4096Apache 2.0
DeepSeek-V22024/05<!--06-->DeepSeek-V2, DeepSeek-V2-ChatDeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model236128kCustom Free with usage restriction and models trained on DeepSeek outputs become DeepSeek derivatives, subject to this license.
Fugaku-LLM2024/05<!--13-->Fugaku-LLM-13B, Fugaku-LLM-13B-instructRelease of "Fugaku-LLM" – a large language model trained on the supercomputer "Fugaku"132048Custom Free with usage restrictions
Falcon 22024/05<!--13-->falcon2-11BMeet Falcon 2: TII Releases New AI Model Series, Outperforming Meta’s New Llama 3118192Custom Apache 2.0 with mild acceptable use policy
Yi-1.52024/05<!--15-->Yi-1.5-6B, Yi-1.5-6B-Chat, Yi-1.5-9B, Yi-1.5-9B-Chat, Yi-1.5-34B, Yi-1.5-34B-ChatYi-1.56, 9, 344096Apache 2.0
DeepSeek-V2-Lite2024/05<!--16-->DeepSeek-V2-Lite, DeepSeek-V2-Lite-ChatDeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model1632kCustom Free with usage restriction and models trained on DeepSeek outputs become DeepSeek derivatives, subject to this license.
Phi-3 small/medium2024/05<!--21-->Phi-3-mini-4k-instruct, Phi-3-mini-128k-instruct, Phi-3-medium-4k-instruct, Phi-3-medium-128k-instructNew models added to the Phi-3 family, available on Microsoft Azure, Technical Report7, 144096, 128kMIT

Open LLMs for code

Language ModelRelease DateCheckpointsPaper/BlogParams (B)Context LengthLicenceTry it
SantaCoder2023/01santacoderSantaCoder: don't reach for the stars!1.12048OpenRAIL-M v1SantaCoder
CodeGen22023/04codegen2 1B-16BCodeGen2: Lessons for Training LLMs on Programming and Natural Languages1 - 162048Apache 2.0
StarCoder2023/05starcoderStarCoder: A State-of-the-Art LLM for Code, StarCoder: May the source be with you!1.1-158192OpenRAIL-M v1
StarChat Alpha2023/05starchat-alphaCreating a Coding Assistant with StarCoder168192OpenRAIL-M v1
Replit Code2023/05replit-code-v1-3bTraining a SOTA Code LLM in 1 week and Quantifying the Vibes — with Reza Shabani of Replit2.7infinity? (ALiBi)CC BY-SA-4.0Replit-Code-v1-3B
CodeT5+2023/05CodeT5+CodeT5+: Open Code Large Language Models for Code Understanding and Generation0.22 - 16512BSD-3-ClauseCodet5+-6B
XGen-7B2023/06XGen-7B-8K-BaseLong Sequence Modeling with XGen: A 7B LLM Trained on 8K Input Sequence Length78192Apache 2.0
CodeGen2.52023/07CodeGen2.5-7B-multiCodeGen2.5: Small, but mighty72048Apache 2.0
DeciCoder-1B2023/08DeciCoder-1BIntroducing DeciCoder: The New Gold Standard in Efficient and Accurate Code Generation1.12048Apache 2.0DeciCoder Demo
Code Llama2023/08Inference Code for CodeLlama models Code Llama: Open Foundation Models for Code7 - 344096Custom Free if you have under 700M users and you cannot use LLaMA outputs to train other LLMs besides LLaMA and its derivativesHuggingChat

Open LLM datasets for pre-training

NameRelease DatePaper/BlogDatasetTokens (T)License
RedPajama2023/04RedPajama, a project to create leading open-source models, starts by reproducing LLaMA training dataset of over 1.2 trillion tokensRedPajama-Data1.2Apache 2.0
starcoderdata2023/05StarCoder: A State-of-the-Art LLM for Codestarcoderdata0.25Apache 2.0

Open LLM datasets for instruction-tuning

NameRelease DatePaper/BlogDatasetSamples (K)License
OIG (Open Instruction Generalist)2023/03THE OIG DATASETOIG44,000Apache 2.0
databricks-dolly-15k2023/04Free Dolly: Introducing the World's First Truly Open Instruction-Tuned LLMdatabricks-dolly-15k15CC BY-SA-3.0
MPT-7B-Instruct2023/05Introducing MPT-7B: A New Standard for Open-Source, Commercially Usable LLMsdolly_hhrlhf59CC BY-SA-3.0

Open LLM datasets for alignment-tuning

NameRelease DatePaper/BlogDatasetSamples (K)License
OpenAssistant Conversations Dataset2023/04OpenAssistant Conversations - Democratizing Large Language Model Alignmentoasst1161Apache 2.0

Evals on open LLMs


What do the licences mean?

Disclaimer: The information provided in this repo does not, and is not intended to, constitute legal advice. Maintainers of this repo are not responsible for the actions of third parties who use the models. Please consult an attorney before using models for commercial purposes.


Improvements