Awesome
<div align="center"> <img src="figs/TableQAKit.png" border="0" width="512"/> <br /> <br />πWebsite | π₯Video | π¦PyPI | π€Huggingface Datasets
<!-- [πDocumentation](https://opencompass.readthedocs.io/en/latest/) | [π οΈInstallation](https://opencompass.readthedocs.io/en/latest/get_started.html#installation) | --> </div>TableQAKit: A Toolkit for Table Question Answering
π₯ Updates
- [2023-8-7]: We released our code, datasets and PyPI Package. Check it out!
β¨ Features
TableQAKit is a unified platform for TableQA (especially in the LLM era). Its main features includes:
- Extensible design: You can use the interfaces defined by the toolkit, extend methods and models, and implement your own new models based on your own data.
- Equipped with LLM: TableQAKit supports LLM-based methods, including LLM-prompting methods and LLM-finetuning methods.
- Comprehensive datasets: We design a unified data interface to process data and store them in Huggingface datasets.
- Powerful methods: Using our toolkit, you can reproduce most of the SOTA methods for TableQA tasks.
- Efficient LLM benchmark: TableQAEval, a benchmark to evaluate the performance of LLM for TableQA. It evaluates LLM's modeling ability of long tables (context) and comprehension capabilities (numerical reasoning, multi-hop reasoning).
- Comprehensive Survey: We are about to release a systematic TableQA Survey, this project is a pre-work. Paper List
βοΈ Install
pip install tableqakit
or
git clone git@github.com:lfy79001/TableQAKit.git
pip install -r requirements.txt
pip install ttqakit
<!-- # Folder
The TableQAKit repository is structured as follows:
```bash
βββ icl/ # LLM-prompting toolkit
β βββ dataset.py
β βββ infer.py
β βββ model.py
β βββ utils.py
βββ llama/ # LLM-finetuning toolkit
β βββ data_collator.py
β βββ dataset.py
β βββ model.py
β βββ peft_trainer.py
β βββ seq2seq.py
β βββ template.py
β βββ Trainer.py
β βββ utils.py
βββ mmqa_utils/ # EncyclopediaQA toolkit
β βββ classifier_module/ # The package for classifier
β β βββ dataset.py
β β βββ model.py
β β βββ train.py
β β βββ trainer.py
β β βββ utils.py
β βββ retriever_module/ # The package for encyclopedia retrieval
β β βββ dataset.py
β β βββ model.py
β β βββ train.py
β β βββ trainer.py
β β βββ utils.py
βββ structuredqa/ # Read model TaLMs
β βββ builder/
β β βββ hybridqa.py
β β βββ msr_sqa.py
β β βββ wikisql_tapas.py
β β βββ wikisql.py
β β βββ wikitq_tapas.py
β β βββ wikitq.py
β βββ utils/
β β βββ common.py
β β βββ configure.py
β β βββ dataset.py
β β βββ tapas_utils.py
β β βββ tapas_wikisql_utils.py
β β βββ tapex_wikisql_utils.py
βββ retriever/ # TableQA's general retriever οΌSpreadSheet examplesοΌ
β βββ dataset.py
β βββ model.py
β βββ trainer.py
β βββ utils.py
βββ multihop/ # Readers for encyclopediaQA
β βββ Retrieval/
β βββ Read/
βββ numerical/ # Readers for some TableQA datasets
βββ TableQAEval/ # The proposed new LLM-Long-Table Benchmark
β βββ Baselines/ # Add your LLMs
β β βββ turbo16k-table.py
β β βββ llama2-chat-table.py
β β βββ ...
β βββ Evaluation/ # metrics
β βββ TableQAEval.json
βββ outputs/ # the results of some models
βββ loaders/
β βββ WikiSQL.py
β βββ ...
βββ structs/
β βββ data.py
βββ static/
βββ LICENSE
βββ README.md
``` -->
π Folder
The TableQAKit repository is structured as follows:
βββ icl/ # LLM-prompting toolkit
βββ llama/ # LLM-finetuning toolkit
βββ mmqa_utils/ # EncyclopediaQA toolkit
β βββ classifier_module/ # The package for classifier
β βββ retriever_module/ # The package for encyclopedia retrieval
βββ structuredqa/ # Read model TaLMs
β βββ builder/
β βββ utils/
βββ retriever/ # TableQA's general retriever οΌSpreadSheet examplesοΌ
βββ multihop/ # Readers for encyclopediaQA
β βββ Retrieval/
β βββ Read/
βββ numerical/ # Readers for some TableQA datasets
βββ TableQAEval/ # The proposed new LLM-Long-Table Benchmark
β βββ Baselines/ # Add your LLMs
β β βββ turbo16k-table.py
β β βββ llama2-chat-table.py
β β βββ ...
β βββ Evaluation/ # metrics
β βββ TableQAEval.json
βββ outputs/ # the results of some models
βββ loaders/
β βββ WikiSQL.py
β βββ ...
βββ structs/
β βββ data.py
βββ static/
βββ LICENSE
βββ README.md
ποΈ Dataset
According to our taxonomy, we classify the TableQA task into three categories of tasks, as shown in the following figure:
<p align="center"> <img src="figs/dataset_examples.png" width="512"> </p> <p align="center"> <img src="figs/table.png" width="512"> </p>π§ Get started
Retrieval Modules
QuickStart
MultiHiertt Dataset as a demonstration
from TableQAKit.retriever import MultiHierttTrainer
trainer = MultiHierttTrainer()
# train stage:
trainer.train()
# infer stage:
trainer.infer()
Train
python main.py \
--train_mode row \
--per_device_train_batch_size 16 \
--per_device_eval_batch_size 1 \
--dataloader_pin_memory False \
--output_dir ./ckpt \
--train_path ./data/train.json \
--val_path ./data/val.json \
--save_steps 1000 \
--logging_steps 20 \
--learning_rate 0.00001 \
--top_n_for_eval 10 \
--encoder_path ./PLM/bert-base-uncased/
Inference
python infer.py \
--output_dir ./ckpt \
--encoder_path ./ckpt/encoder/deberta-large \
--dataloader_pin_memory False \
--ckpt_for_test ./ckpt/retriever/deberta/epoch1_step30000.pt \
--test_path ./data/MultiHiertt/test.json \
--test_out_path ./prediction.json \
--top_n_for_test 10
Create Trainer for New Dataset
from TableQAKit.retriever import RetrieverTrainer as RT
class NewTrainer(RT):
def read_data(self, data_path: str) -> List[Dict]:
"""
:param data_path: The path of data
:return: List of raw data
[
data_1,
data_2,
β¦β¦
]
"""
data = json.load(
open(data_path, 'r', encoding='utf-8')
)
return data
def data_proc(self, instance) -> Dict:
"""
:return:
{
"id": str,
"question": str,
"rows": list[str],
"labels": list[int]
}
"""
rows = instance["paragraphs"]
labels = [0] * len(instance["paragraphs"])
if len(instance["qa"]["text_evidence"]):
for text_evidence in instance["qa"]["text_evidence"]:
labels[text_evidence] = 1
for k, v in instance["table_description"].items():
rows.append(v)
labels.append(1 if k in instance["qa"]["table_evidence"] else 0)
return {
"id": instance["uid"],
"question": instance["qa"]["question"],
"rows": rows,
"labels": labels
}
LLM-Prompting Methods
<p align="center"> <img src="figs/llm_prompting.jpg" width="512"> </p>Check hear for more details.
LLM-Finetuning Methods
<p align="center"> <img src="figs/llm_finetuning.jpg" width="512"> </p>Check hear for more details.
Reading Modules
TaLM Reasoner
Check hear for more details.
Multimodal Reasoner
Check hear for more details.
TableQAEval
<p align="center"> <img src="figs/TableQAEval.png" width="400"> </p>TableQAEval is a benchmark to evaluate the performance of LLM for TableQA. It evaluates LLM's modeling ability of long tables (context) and comprehension capabilities (numerical reasoning, multi-hop reasoning).
Leaderboard
Model | Parameters | Numerical Reasoning | Multi-hop Reasoning | Structured Reasoning | Total |
---|---|---|---|---|---|
Turbo-16k-0613 | - | 20.3 | 52.8 | 54.3 | 43.5 |
LLaMA2-7b-chat | 7B | 2.0 | 14.2 | 13.4 | 12.6 |
ChatGLM2-6b-8k | 6B | 1.4 | 10.1 | 11.5 | 10.2 |
LLaMA2-7b-4k | 7B | 0.8 | 9.2 | 5.4 | 6.6 |
longchat-7b-16k | 7B | 0.3 | 7.1 | 5.1 | 5.2 |
LLaMA-7b-2k | 7B | 0.5 | 7.3 | 4.1 | 4.5 |
MPT-7b-65k | 7B | 0.3 | 3.2 | 2.0 | 2.3 |
LongLLaMA-3b | 3B | 0.0 | 4.3 | 1.7 | 2.0 |
More details are shown in TableQAEval.
β TODO
We will continue to optimize the toolkit.
Acknowledge
Primary contributors: Fangyu Lei, Tongxu Luo, Pengqi Yang, Weihao Liu, Hanwen Liu, Jiahe Lei, Yifan Wei, Shizhu He and Kang Liu.
Thank you very much to Yilun ZhaoοΌYale UniversityοΌand Yongwei Zhou (HIT) for their assistance.