Home

Awesome

<div align= "center"> <h1> 🛠️ToolQA</h1> </div>

🛠️ The official repository for code and data of ToolQA dataset. ToolQA is a open-source dataset specifically designed for evaluations on tool-augmented large language models (LLMs). This repo provides the dataset, the corresponding data generation code, and the implementations of baselines on our dataset.

Features

<p align="center"> <img width="800" src="./figure/overview.png" > </p>

Dataset Statistics

ToolQA consists of data from 8 distinct domains, each instance being a tuple — (question, answer, reference corpora, and tools). The reference corpora are external knowledge sources that can be queried, which can be a text corpus, a tabular database, or a graph.

ContextTopicKnowledge FormatKnowledge Size# Easy Templates# Easy Questions# Hard Templates# Hard Questions
TemporalFlightTabular Database40783181010010100
TemporalCoffeeTabular Database5746810013130
SpatialYelpTabular Database1503461110010100
SpatialAirbnbTabular Database1025991010010100
MathematicalGSM8KProfessional Ability--100--
SocialDBLPGraph5533201010010100
ScientificSciREXPure-Text Corpus43811004100
PersonalAgendaPure-Text Corpus1000051005100
SUM---5580062730

Data Download

We offer the download link for all the data involved in ToolQA. We offer two categories of data for download and use. The first category is external corpus. This sort of data have already been pre-processed by us and they are used for external tools to interact, (e.g., retrieve, database operations, etc.). The second category of data is the raw data, which cannot be used as external knowledge of ToolQA to interact. We offer this part of data just for users if they want to generate more questions and answers for model tuning or thorough evaluation.

External Corpus Download

The external corpus can be downloaded through this link. After downloading and unzipping, users need to place it under the directory /<YOUR_OWN_PATH>/ToolQA/data/external_corpus/.

Raw Data Download

All the data sources and download guidance are listed below:

Generate New Questions

You can also use the ToolQA to generate new questions under our templates for tuning and new sets of evalations. We offer the data generation code in /dataset_generation/ directory. The only thing to do is to modify the paths in the notebooks.

Tool Implementation

We offer a list of implemented tools in each of the baselines in the benchmark, like ./benchmark/ReAct/code/tools. Please note that the questions are intentionally designed to be open-ended. This reflects our belief that these questions pose sufficient challenges, and we don't wish to limit the tools suggested in our paper. We welcome experiments with more advanced tools (like a superior retriever) to enhance performance or devising a more effective planning module for better compositional usage of our defined tools. Therefore, we are excited to see diverse implementations in response to all our questions.

Retriever

We implement the retriever with Langchain package and the Chroma vector database. We have uploaded the pre-processed chroma vectorbase in the Download Link. Please download the file under the directory /<YOUR_OWN_PATH>/ToolQA/data/chroma_db/.

SQL Interpreter

To interprete SQL commands, the user may need to load the database into the mysql database first. You can run the following commands for database creation (the entire process may take hours):

python ./benchmark/ReAct/code/tools/table/mysql_db_create.py

Math Calculator

To use the calculator in the implementation. You first need to sign up an account through the official Wolframalpha developer portal.

Current Progress

The data and code are in the final stage of cleaning and will be public gradually in a very short period. We offer the detailed progress of the final examination in the TODO list part.

Open-source Progress

Questions?

If you have any questions, feel free to reach out to yczhuang at gatech.edu. Please try to specify the problem with details so we can help you better and quicker!

Citation

If you find this repository valuable for your research, we kindly request that you acknowledge our paper by citing the following paper. We appreciate your consideration.

@misc{zhuang2023toolqa,
      title={ToolQA: A Dataset for LLM Question Answering with External Tools}, 
      author={Yuchen Zhuang and Yue Yu and Kuan Wang and Haotian Sun and Chao Zhang},
      year={2023},
      eprint={2306.13304},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}