Home

Awesome

ThoughtSource⚡

A framework for the science of machine thinking

DatasetsTutorial notebookInstallation guideDataset Annotator

ThoughtSource is a central, open resource and community centered on data and tools for chain-of-thought reasoning in large language models (Wei 2022). Our long-term goal is to enable trustworthy and robust reasoning in advanced AI systems for driving scientific research and medical practice.

<p align="center"> <img alt="ThoughtSource overview 3" src="./resources/images/thoughtsource-overview-3.svg"> </p>

📄 Pre-print: Ott et al. "ThoughtSource: A central hub for large language model reasoning data", arXiv, 2023

📄 Pre-print: Hebenstreit et al. "An automatically discovered chain-of-thought prompt generalizes to novel models and datasets", arXiv, 2023

Workflow

<p align="center"> <img alt="ThoughtSource overview 1" src="./resources/images/thoughtsource-overview-1.svg"> <img alt="ThoughtSource overview 2" src="./resources/images/thoughtsource-overview-2.svg"> </p>

Available datasets

Our dataloaders allow you to access the following datasets in a standardized chain-of-thought format. The dataloaders create objects in the Hugging Face 🤗 Datasets format. We (sometimes extensively) post-processed the source datasets in different ways to create more coherent reasoning chains.


<p align="center"> Datasets can be <a href="http://thought.samwald.info/"><b>browsed online through the Dataset Viewer 🔎</b></a> </p>

General question answering

Scientific / medical question answering

Math word problems

Collections of datasets

For quick and economic formative evaluation of CoT reasoning, we combined random examples of the above datasets to collections.

collection = Collection.load_thoughtsource_33()

We are working on collecting and generating additional datasets, and on further improving the quality of existing datasets (see dataset issues). We welcome suggestions for the inclusion of other datasets.

We welcome dataset contributions! 👉 Have a look at our contribution guide!

Annotator

<p align="center"> <img alt="Demonstration of the annotator tool" src="./resources/images/annotator-demo.webp" width="80%">

The annotator allows for highlighting similarities between different generated reasoning chains, making it easier to spot strenghts and weaknesses and to select best results.

</p>
<p align="center"> <a href="http://thought.samwald.info:3000/"><b> Use the web-based annotator 📝</b></a><br/> To try out the annotator, simply type in your name and load this<a href="https://github.com/OpenBioLink/ThoughtSource/blob/main/notebooks/worldtree_10.json" target="_blank"> example file</a> </p>
<br/>

Installation and code structure

Installation

execute in terminal line by line:

git clone git@github.com:OpenBioLink/ThoughtSource.git
cd ThoughtSource
# install pip and virtualenv
sudo apt install python3-pip
sudo apt install python3-venv
# create and activate virtual environment
python3 -m venv venv
source ./venv/bin/activate
# install requirements and API packages
pip install -e ./libs/cot[api]

Applications

Libraries

# 1) Dataset loading and selecting a random sample
collection = Collection(["worldtree"], verbose=False)
collection = collection.select(split="train", number_samples=10)

# 2) Language Model generates chains of thought and then extracts answers
config={
    "instruction_keys": ['qa-01'], # "Answer the following question through step-by-step reasoning."
    "cot_trigger_keys": ['kojima-01'], # "Answer: Let's think step by step."
    "answer_extraction_keys": ['kojima-A-D'], # "Therefore, among A through D, the answer is"
    "api_service": "huggingface_hub",
    "engine": "google/flan-t5-xl",
    "warn": False,
    "verbose": False,
}
collection.generate(config=config)

# 3) Performance evaluation
collection.evaluate()
{'accuracy': {'qa-01_kojima-01_kojima-A-D': 0.6}}

<p align="center"> 👉 See the <a href="https://github.com/OpenBioLink/ThoughtSource/blob/main/notebooks/tutorial.ipynb/"><b>tutorial notebook</b></a> for more code examples. </p>

Citation

@misc{https://doi.org/10.48550/arxiv.2301.11596,
  doi = {10.48550/ARXIV.2301.11596},
  url = {https://arxiv.org/abs/2301.11596},
  author = {Ott, Simon and Hebenstreit, Konstantin and Liévin, Valentin and Hother, Christoffer Egeberg and Moradi, Milad and Mayrhauser, Maximilian and Praas, Robert and Winther, Ole and Samwald, Matthias},
  keywords = {Computation and Language (cs.CL), Artificial Intelligence (cs.AI), FOS: Computer and information sciences, FOS: Computer and information sciences},
  title = {ThoughtSource: A central hub for large language model reasoning data},
  publisher = {arXiv},
  year = {2023}, 
  copyright = {Creative Commons Attribution 4.0 International}
}

Versioning

All updates/changes to datasets are explicitly mentioned in bold.

<details> <summary>1.0.0 (2023-07-11)</summary> </details> <details> <summary>0.0.5 (2023-03-10)</summary> </details> <details> <summary>0.0.4 (2023-03-08)</summary> </details> <details> <summary>0.0.3 (2023-02-24)</summary> </details> <details> <summary>0.0.2 (2023-02-15)</summary> </details> <details> <summary>0.0.1 (2023-02-01)</summary> </details>