Home

Awesome

InstructionZoo

A collection of open-source Instruction-tuning dataset to train chat-based LLMs (ChatGPT,LLaMA,Alpaca).

This is an on-going project. We will soon add tags to classify the following datasets and continuously update our collection.

Table of Contents

The template

## [owner/project-name](https://github.com/link/to/project)

* Size:
* Language:
* Summary:
* Generation Method:
* Paper:
* HuggingFace: (if applicable)
* Demo: (if applicable)
* License:

The English Instruction Datasets

tatsu-lab/Alpaca

gururise/Cleaned Alpaca

PhoebusSi/Alpaca-COT

QingyiSi/Alpaca-CoT

orhonovich/unnatural-instructions

bigscience/PromptSource

bigscience/P3

allenai/natural-instructions

allenai/super-natural-instructions

google-research/FLAN 2021

google-research/FLAN 2022 Collection

tasksource-instruct

LianjiaTech/BELLE 1.5M

LianjiaTech/BELLE 10M

XueFuzhao/InstructionWild

ExMix

UnifiedSKG

MetaICL

openai/InstructionGPT

facebookresearch/metasqe/OPT-IML

THUDM/GLM-130B

laion/OIG

baize/baize-chatbot

lightaime/camel

thunlp/UltraChat

databrickslabs/doll

Instruction-Tuning-with-GPT-4/GPT-4-LLM

ShareGPT

stanfordnlp/SHP

Anthropic/hh-rlhf

HuggingFaceH4/stack-exchange-preferences

Hellp-SimpleAI/HC3

f/awesome-chatgpt-prompts

The Chinese Instruction Datasets

FlagOpen/FlagInstruct

CLUEbenchmark/pCLUE

ydli-ai/CSL

YeungNLP/Firefly

TsinghuaAI/CUGE

ydli-ai/Chinese-ChatLLaMA

ZeroPrompt

PlexPt/awesome-chatgpt-prompts-zh

Chinese Alpaca

carbonz0/alpaca-chinese-dataset

hikariming/alpaca_chinese_dataset

ymcui/Chinese-LLaMA-Alpaca

LC1332/Chinese-alpaca-lora

A-baoYang/alpaca-7b-chinese

A-baoYang/alpaca-7b-chinese

ntunlplab/traditional-chinese-alpaca

ntunlplab/traditional-chinese-alpaca

The Miltilingual Instruction Datasets

bigscience/xP3

JosephusCheung/GuanacoDataset

JosephusCheung/GuanacoDataset QA

The Code Instruction Datasets

sahil280114/codealpaca