Home

Awesome

<div align="center"> <img src="figs/TableQAKit.png" border="0" width="512"/> <br /> <br />

🌐Website | πŸŽ₯Video | πŸ“¦PyPI | πŸ€—Huggingface Datasets

<!-- [πŸ“˜Documentation](https://opencompass.readthedocs.io/en/latest/) | [πŸ› οΈInstallation](https://opencompass.readthedocs.io/en/latest/get_started.html#installation) | --> </div>

TableQAKit: A Toolkit for Table Question Answering

πŸ”₯ Updates

✨ Features

TableQAKit is a unified platform for TableQA (especially in the LLM era). Its main features includes:

βš™οΈ Install

pip install tableqakit
or
git clone git@github.com:lfy79001/TableQAKit.git
pip install -r requirements.txt

pip install ttqakit
<!-- # Folder The TableQAKit repository is structured as follows: ```bash β”œβ”€β”€ icl/ # LLM-prompting toolkit β”‚ β”œβ”€β”€ dataset.py β”‚ β”œβ”€β”€ infer.py β”‚ β”œβ”€β”€ model.py β”‚ └── utils.py β”œβ”€β”€ llama/ # LLM-finetuning toolkit β”‚ β”œβ”€β”€ data_collator.py β”‚ β”œβ”€β”€ dataset.py β”‚ β”œβ”€β”€ model.py β”‚ β”œβ”€β”€ peft_trainer.py β”‚ β”œβ”€β”€ seq2seq.py β”‚ β”œβ”€β”€ template.py β”‚ β”œβ”€β”€ Trainer.py β”‚ └── utils.py β”œβ”€β”€ mmqa_utils/ # EncyclopediaQA toolkit β”‚ β”œβ”€β”€ classifier_module/ # The package for classifier β”‚ β”‚ β”œβ”€β”€ dataset.py β”‚ β”‚ β”œβ”€β”€ model.py β”‚ β”‚ β”œβ”€β”€ train.py β”‚ β”‚ β”œβ”€β”€ trainer.py β”‚ β”‚ └── utils.py β”‚ β”œβ”€β”€ retriever_module/ # The package for encyclopedia retrieval β”‚ β”‚ β”œβ”€β”€ dataset.py β”‚ β”‚ β”œβ”€β”€ model.py β”‚ β”‚ β”œβ”€β”€ train.py β”‚ β”‚ β”œβ”€β”€ trainer.py β”‚ β”‚ └── utils.py β”œβ”€β”€ structuredqa/ # Read model TaLMs β”‚ β”œβ”€β”€ builder/ β”‚ β”‚ β”œβ”€β”€ hybridqa.py β”‚ β”‚ β”œβ”€β”€ msr_sqa.py β”‚ β”‚ β”œβ”€β”€ wikisql_tapas.py β”‚ β”‚ β”œβ”€β”€ wikisql.py β”‚ β”‚ β”œβ”€β”€ wikitq_tapas.py β”‚ β”‚ └── wikitq.py β”‚ β”œβ”€β”€ utils/ β”‚ β”‚ β”œβ”€β”€ common.py β”‚ β”‚ β”œβ”€β”€ configure.py β”‚ β”‚ β”œβ”€β”€ dataset.py β”‚ β”‚ β”œβ”€β”€ tapas_utils.py β”‚ β”‚ β”œβ”€β”€ tapas_wikisql_utils.py β”‚ β”‚ └── tapex_wikisql_utils.py β”œβ”€β”€ retriever/ # TableQA's general retriever (SpreadSheet examplesοΌ‰ β”‚ β”œβ”€β”€ dataset.py β”‚ β”œβ”€β”€ model.py β”‚ β”œβ”€β”€ trainer.py β”‚ └── utils.py β”œβ”€β”€ multihop/ # Readers for encyclopediaQA β”‚ β”œβ”€β”€ Retrieval/ β”‚ └── Read/ β”œβ”€β”€ numerical/ # Readers for some TableQA datasets β”œβ”€β”€ TableQAEval/ # The proposed new LLM-Long-Table Benchmark β”‚ β”œβ”€β”€ Baselines/ # Add your LLMs β”‚ β”‚ β”œβ”€β”€ turbo16k-table.py β”‚ β”‚ β”œβ”€β”€ llama2-chat-table.py β”‚ β”‚ └── ... β”‚ β”œβ”€β”€ Evaluation/ # metrics β”‚ └── TableQAEval.json β”œβ”€β”€ outputs/ # the results of some models β”œβ”€β”€ loaders/ β”‚ β”œβ”€β”€ WikiSQL.py β”‚ └── ... β”œβ”€β”€ structs/ β”‚ β”œβ”€β”€ data.py β”œβ”€β”€ static/ β”œβ”€β”€ LICENSE └── README.md ``` -->

πŸ“ Folder

The TableQAKit repository is structured as follows:

β”œβ”€β”€ icl/ # LLM-prompting toolkit
β”œβ”€β”€ llama/ # LLM-finetuning toolkit
β”œβ”€β”€ mmqa_utils/ # EncyclopediaQA toolkit
β”‚   β”œβ”€β”€ classifier_module/ # The package for classifier
β”‚   β”œβ”€β”€ retriever_module/ # The package for encyclopedia retrieval
β”œβ”€β”€ structuredqa/ # Read model TaLMs
β”‚   β”œβ”€β”€ builder/
β”‚   β”œβ”€β”€ utils/
β”œβ”€β”€ retriever/ # TableQA's general retriever (SpreadSheet examplesοΌ‰
β”œβ”€β”€ multihop/ # Readers for encyclopediaQA
β”‚   β”œβ”€β”€ Retrieval/
β”‚   └── Read/
β”œβ”€β”€ numerical/ # Readers for some TableQA datasets
β”œβ”€β”€ TableQAEval/ # The proposed new LLM-Long-Table Benchmark
β”‚   β”œβ”€β”€ Baselines/ # Add your LLMs
β”‚   β”‚   β”œβ”€β”€ turbo16k-table.py
β”‚   β”‚   β”œβ”€β”€ llama2-chat-table.py
β”‚   β”‚   └── ...
β”‚   β”œβ”€β”€ Evaluation/ # metrics
β”‚   └── TableQAEval.json  
β”œβ”€β”€ outputs/ # the results of some models
β”œβ”€β”€ loaders/ 
β”‚   β”œβ”€β”€ WikiSQL.py
β”‚   └── ...
β”œβ”€β”€ structs/ 
β”‚   β”œβ”€β”€ data.py
β”œβ”€β”€ static/ 
β”œβ”€β”€ LICENSE
└── README.md

πŸ—ƒοΈ Dataset

According to our taxonomy, we classify the TableQA task into three categories of tasks, as shown in the following figure:

<p align="center"> <img src="figs/dataset_examples.png" width="512"> </p> <p align="center"> <img src="figs/table.png" width="512"> </p>

πŸ”§ Get started

Retrieval Modules

QuickStart

MultiHiertt Dataset as a demonstration

from TableQAKit.retriever import MultiHierttTrainer


trainer = MultiHierttTrainer()
# train stage:
trainer.train()
# infer stage:
trainer.infer()

Train

python main.py \
--train_mode row \
--per_device_train_batch_size 16 \
--per_device_eval_batch_size 1 \
--dataloader_pin_memory False \
--output_dir ./ckpt \
--train_path ./data/train.json \
--val_path ./data/val.json \
--save_steps 1000 \
--logging_steps 20 \
--learning_rate 0.00001 \
--top_n_for_eval 10 \
--encoder_path ./PLM/bert-base-uncased/

Inference

python infer.py \
--output_dir ./ckpt \
--encoder_path ./ckpt/encoder/deberta-large \
--dataloader_pin_memory False \
--ckpt_for_test ./ckpt/retriever/deberta/epoch1_step30000.pt \
--test_path ./data/MultiHiertt/test.json \
--test_out_path ./prediction.json \
--top_n_for_test 10

Create Trainer for New Dataset

from TableQAKit.retriever import RetrieverTrainer as RT

class NewTrainer(RT):
    def read_data(self, data_path: str) -> List[Dict]:
        """

        :param data_path: The path of data
        :return: List of raw data
        [
            data_1,
            data_2,
            ……
        ]
        """
        data = json.load(
            open(data_path, 'r', encoding='utf-8')
        )
        return data

    def data_proc(self, instance) -> Dict:
        """

        :return:
        {
            "id": str,
            "question": str,
            "rows": list[str],
            "labels": list[int]
        }
        """
        rows = instance["paragraphs"]
        labels = [0] * len(instance["paragraphs"])
        if len(instance["qa"]["text_evidence"]):
            for text_evidence in instance["qa"]["text_evidence"]:
                labels[text_evidence] = 1
        for k, v in instance["table_description"].items():
            rows.append(v)
            labels.append(1 if k in instance["qa"]["table_evidence"] else 0)
        return {
            "id": instance["uid"],
            "question": instance["qa"]["question"],
            "rows": rows,
            "labels": labels
        }

LLM-Prompting Methods

<p align="center"> <img src="figs/llm_prompting.jpg" width="512"> </p>

Check hear for more details.

LLM-Finetuning Methods

<p align="center"> <img src="figs/llm_finetuning.jpg" width="512"> </p>

Check hear for more details.

Reading Modules

TaLM Reasoner

Check hear for more details.

Multimodal Reasoner

Check hear for more details.

TableQAEval

<p align="center"> <img src="figs/TableQAEval.png" width="400"> </p>

TableQAEval is a benchmark to evaluate the performance of LLM for TableQA. It evaluates LLM's modeling ability of long tables (context) and comprehension capabilities (numerical reasoning, multi-hop reasoning).

Leaderboard

ModelParametersNumerical ReasoningMulti-hop ReasoningStructured ReasoningTotal
Turbo-16k-0613-20.352.854.343.5
LLaMA2-7b-chat7B2.014.213.412.6
ChatGLM2-6b-8k6B1.410.111.510.2
LLaMA2-7b-4k7B0.89.25.46.6
longchat-7b-16k7B0.37.15.15.2
LLaMA-7b-2k7B0.57.34.14.5
MPT-7b-65k7B0.33.22.02.3
LongLLaMA-3b3B0.04.31.72.0

More details are shown in TableQAEval.

βœ… TODO

We will continue to optimize the toolkit.

Acknowledge

Primary contributors: Fangyu Lei, Tongxu Luo, Pengqi Yang, Weihao Liu, Hanwen Liu, Jiahe Lei, Yifan Wei, Shizhu He and Kang Liu.

Thank you very much to Yilun Zhao(Yale UniversityοΌ‰and Yongwei Zhou (HIT) for their assistance.