Awesome

A-Paper-List-of-Awesome-Tabular-LLMs

Different types of tables are widely used to store and present information. To automatically process numerous tables and gain valuable insights, researchers have proposed a series of deep-learning models for various table-based tasks, e.g., table question answering (TQA), table-to-text (T2T), text-to-sql (NL2SQL) and table fact verification (TFV). Recently, the emerging Large Language Models (LLMs) and more powerful Multimodal Large Language Models (MLLMs) have opened up new possibilities for processing the tabular data, i.e., we can use one general model to process diverse tables and fulfill different tabular tasks based on the user natural language instructions. We refer to these LLMs speciallized for tabular tasks as Tabular LLMs. In this repository, we collect a paper list about recent Tabular (M)LLMs and divide them into the following categories based on their key idea.

<center> Table of Contents: </center>

Survey of Tabular LLMs and table understanding
Prompting LLMs for different tabular tasks, e.g., in-context learning, prompt engineering and integrating external tools.
Training LLMs for better table understanding ability, e.g., training existing LLMs by instruction fine-tuning or post-pretraining.
Developing Agents for tabular data, e.g., devolping copilot for processing excel tables.
RAG with tabular data, e.g., devolping RAG systems for understanding long tables.
Empirical study or benchmarks for evaluating LLMs' table understanding ability, e.g., exploring the influence of various table types or table formats.
Multimodal table understanding, e.g., training MLLMs to understand diverse table images and textual user requests.
Table Understanding datasets, e.g., valuable datasets for model training and evaluation.
Evaluation Metrics for Table Understanding, e.g., devising better evaluation method for table understanding.

<center> Task Names and Abbreviations: </center>

Task Names	Abbreviations	Task Descriptions
Table Question Answering	TQA	Answering questions based on the table(s), e.g., answer look-up or computation questions about table(s).
Table-to-Text	Table2Text or T2T	Generate a text based on the table(s), e.g., generate a analysis report given a financial statement.
Text-to-Table	Text2Table	Generate structured tables based on input text, e.g., generate a statistical table based on the game summary.
Table Fact Verification	TFV	Judging if a statement is true or false (or not enough evidence) based on the table(s)
Text-to-SQL	NL2SQL	Generate a SQL statement to answer the user question based on the database schema
Tabular Mathematical Reasoning	TMR	Solving mathematical reasoning problems based on the table(s), e.g., solve math word problems related to a table
Table-and-Text Question Answering	TAT-QA	Answering questions based on both table(s) and their related texts, e.g., answer questions given wikipedia tables and their surrounding texts.
Table Interpretation	TI	Interpreting basic table content and structure information, e.g., column type annotation, entity linking, relation extraction, cell type classification et al.
Table Augmentation	TA	Augmenting existing tables with new data, e.g., schema augmentation, row population, et al.

1. Survey of Tabular LLMs and Table Understanding

Title	Conference	Date	Pages
Language Modeling on Tabular Data: A Survey of Foundations, Techniques and Evolution	arxiv	2024-08-20	49
Large Language Model for Table Processing: A Survey	arxiv	2024-02-04	9
A Survey of Table Reasoning with Large Language Models	arxiv	2024-02-13	9
Large Language Models(LLMs) on Tabular Data: Prediction, Generation, and Understanding -- A Survey	arxiv	2024-03-01	41
Transformers for Tabular Data Representation: A Survey of Models and Applications	TACL 2023		23
Table Pre-training: A Survey on Model Architectures, Pre-training Objectives, and Downstream Tasks	IJCAI 2022	2022-01-24	15

2. Prompting LLMs for Different Tabular Tasks

Title	Conference	Date	Task	Code
Retrieval & Fine-Tuning for In-Context Tabular Models	NIPS 2024	2024-06-07	Machine learning tasks with tabular data
<br> GraphOTTER: Evolving LLM-based Graph Reasoning for Complex Table Question Answering	COLING 2025	2024-12-02	TQA
PoTable: Programming Standardly on Table-based Reasoning Like a Human Analyst	arxiv	2024-12-05	TQA, TFV
Unveiling Implicit Table Knowledge with Question-Then-Pinpoint Reasoner for Insightful Table Summarization	EMNLP 2024 Findings	2024-06-18	Table Summarization
TKGT: Redefinition and A New Way of Text-to-Table Tasks Based on Real World Demands and Knowledge Graphs Augmented LLMs	EMNLP 2024		Text2Table
Text-Tuple-Table: Towards Information Integration in Text-to-Table Generation via Global Tuple Extraction	EMNLP 2024	2024-04-22	Text2Table	Github
TART: An Open-Source Tool-Augmented Framework for Explainable Table-based Reasoning	arxiv	2024-09-18	TQA	Github
SynTQA: Synergistic Table-based Question Answering via Mixture of Text-to-SQL and E2E TQA	EMNLP 2024	2024-09-25	TQA
<br> FLEXTAF: Enhancing Table Reasoning with Flexible Tabular Formats	arxiv	2024-08-16	TQA, TFV	Github
Learning Relational Decomposition of Queries for Question Answering from Tables	ACL 2024		TQA
TaPERA: Enhancing Faithfulness and Interpretability in Long-Form Table QA by Content Planning and Execution-based Reasoning	ACL 2024		TQA
Enhancing Temporal Understanding in LLMs for Semi-structured Tables	arxiv	2024-07-22	Temporal TQA
<br> ALTER: Augmentation for Large-Table-Based Reasoning	arxiv	2024-07-03	TQA	Github
TrustUQA: A Trustful Framework for Unified Structured Data Question Answering	arxiv	2024-06-27	TQA
Adapting Knowledge for Few-shot Table-to-Text Generation	arxiv	2024-03-27	T2T
Graph Reasoning Enhanced Language Models for Text-to-SQL	SIGIR 2024		NL2SQL
NormTab: Improving Symbolic Reasoning in LLMs Through Tabular Data Normalization	arxiv	2024-06-25	TQA,TFV
Improving Factual Accuracy of Neural Table-to-Text Output by Addressing Input Problems in ToTTo	NAACL 2024	2024-04-05	T2T
TabSQLify: Enhancing Reasoning Capabilities of LLMs Through Table Decomposition	NAACL 2024		TQA,TFV
<br> E5: Zero-shot Hierarchical Table Analysis using Augmented LLMs via Explain, Extract, Execute, Exhibit and Extrapolate	NAACL 2024		TQA on hierarchical tables	Github
OpenTE: Open-Structure Table Extraction From Text	ICASSP 2024		Text-to-Table Extraction
On Linearizing Structured Data in Encoder-Decoder Language Models: Insights from Text-to-SQL	NAACL 2024	2024-04-03	NL2SQL
MFORT-QA: Multi-hop Few-shot Open Rich Table Question Answering	arxiv	2024-03-28	TQA
<br> OpenTab: Advancing Large Language Models as Open-domain Table Reasoners	ICLR 2024	2024-02-22	TQA,TFV	Github
CABINET: Content Relevance based Noise Reduction for Table Question Answering	ICLR 2024	2024-02-02	TQA
<br> Augment before You Try: Knowledge-Enhanced Table Question Answering via Table Expansion	arxiv	2024-01-24	TQA	Github
Chain-of-Table: Evolving Tables in the Reasoning Chain for Table Understanding	ICLR 2024	2024-01-09	TQA,TFV
TAP4LLM: Table Provider on Sampling, Augmenting, and Packing Semi-structured Data for Large Language Model Reasoning	EMNLP 2024 Findings	2023-12-14	TQA,TAT-QA,TFV,T2T	Github
Large Language Models are Complex Table Parsers	EMNLP 2023	2023-12-13	TQA
API-Assisted Code Generation for Question Answering on Varied Table Structures	EMNLP 2023	2023-10-23	TQA
<br> TableQAKit: A Comprehensive and Practical Toolkit for Table-based Question Answering	arxiv	2023-10-23	TQA,NL2SQL	Github
Enhancing Few-shot Text-to-SQL Capabilities of Large Language Models: A Study on Prompt Design Strategies	arxiv	2023-05-21	NL2SQL
<br>StructGPT: A General Framework for Large Language Model to Reason over Structured Data	EMNLP 2023	2023-05-16	TQA, TFV	Github
<br> Chameleon：Plug-and-Play Compositional Reasoning with Large Language Models	NIPS 2023	2023-04-19	TMR	Github
Generate, Transform, Answer: Question Specific Tool Synthesis for Tabular Data	EMNLP 2023	2023-03-17	TQA,NL2SQL
DTT: An Example-Driven Tabular Transformer for Joinability by Leveraging Large Language Models	SIGMOD 2024	2023-03-12	Table Transformation
<br> Large Language Models are Versatile Decomposers：Decompose Evidence and Questions for Table-based Reasoning	SIGIR 2023	2023-01-13	TQA, TFV	Github
<br> Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks	TMLR 2023	2022-11-22	TMR, TAT-QA	Github
<br> Large Language Models are few(1)-shot Table Reasoners	EACL 2023 Findings	2022-10-13	TQA, TFV	Github
<br> Binding Language Models in Symbolic Languages	ICLR 2023	2022-10-06	TQA, TFV	Github
<br> Dynamic Prompt Learning via Policy Gradient for Semi-structured Mathematical Reasoning	ICLR 2023	2022-09-29	TMR (Tabular Mathematical Reasoning)	Github

3. Training LLMs for Better Table Understanding Ability

Title	Conference	Date	Task	LLM Backbone	Code
TableGPT2: A Large Multimodal Model with Tabular Data Integration	arxiv	2024-11-04	TQA, TFV, et al.	Qwen2.5 model family with a special pre-trained table encoder.	Github
Large Scale Transfer Learning for Tabular Data via Language Modeling	NIPS 2024	2024-06-17	tabular data prediction (classification and binned regression)	Llama 3-8B
ProTrix: Building Models for Planning and Reasoning over Tables with Sentence Context	EMNLP 2024 Findings	2024-03-04	TQA, TFV	Llama-2	Github
UniTabNet: Bridging Vision and Language Models for Enhanced Table Structure Recognition	EMNLP 2024 Findings	2024-09-20	Table Recognition
Table Question Answering for Low-resourced Indic Languages	EMNLP 2024	2024-10-04	Indian TQA	mBART	Github
TabMoE: A General Framework for Diverse Table-Based Reasoning with Mixture-of-Experts	Mathematics	2024-08-16	TQA, TFV, T2T	BART
<br/>rLLM: Relational Table Learning with LLMs	arxiv	2024-07-29	multi-table joint learning tasks	a PyTorch library designed for Relational Table Learning (RTL) with Large Language Models (LLMs).	Github
<br> Mambular: A Sequential Model for Tabular Deep Learning	arxiv	2024-08-12	ML Classification and Regression tasks like California Housing	Mamba	Github
MambaTab: A Plug-and-Play Model for Learning Tabular Data	MIPR 2024	2024-01-16	ML Classification tasks	Mamba
SpreadsheetLLM: Encoding Spreadsheets for Large Language Models	arxiv	2024-07-12	Excel Manipulation
Unleashing the Potential of Large Language Models for Predictive Tabular Tasks in Data Science	arxiv	2024-03-29	Predictive Tabular Tasks	Llama2 7B	HuggingFace
HGT: Leveraging Heterogeneous Graph-enhanced Large Language Models for Few-shot Complex Table Understanding	arxiv	2024-03-28	TI,TQA	Vicuna-1.5 7B
<br> TableLLM: Enabling Tabular Data Manipulation by LLMs in Real Office Usage Scenarios	arxiv	2024-03-28	Table Manipulation	CodeLlama 7B, 13B	Github
<br> StructLM: Towards Building Generalist Models for Structured Knowledge Grounding	CoLM 2024	2024-02-26	TQA,TFV,T2T,NL2SQL	CodeLlama 7B-34B	Github
<br> TAT-LLM: A Specialized Language Model for Discrete Reasoning over Tabular and Textual Data	arxiv	2024-01-24	TQA	Llama2 7B, 13B, 70B	Github
<br> TableLlama: Towards Open Large Generalist Models for Tables	NAACL 2024	2023-11-15	TQA,TFV,T2T,TA,TI	Llama2 7B	Github
HELLaMA: LLaMA-based Table to Text Generation by Highlighting the Important Evidence	arxiv	2023-11-15	T2T	Llama2 7B-13B
Table-GPT: Table-tuned GPT for Diverse Table Tasks	arxiv	2023-10-13	TQA	GPT-3.5, ChatGPT

Pre-trained Tabular Language Models (non-LLM)

Title	Conference	Date	Task	Code
<br> HYTREL: Hypergraph-enhanced Tabular Data Representation Learning	NIPS 2023	2023-07-14	TA, TI	Github
FLAME: A small language model for spreadsheet formulas	AAAI 2024	2023-01-31	Generating Excel Formulas	Github

4. Developing Agents for Processing Tabular Data

Title	Conference	Date	Task	Code
SheetAgent: A Generalist Agent for Spreadsheet Reasoning and Manipulation via Large Language Models	arxiv	2024-03-06	Manipulating Excels with LLM	Github
<br> EHRAgent: Code Empowers Large Language Models for Few-shot Complex Tabular Reasoning on Electronic Health Records	arxiv	2024-01-13	TQA	Github
<br> InfiAgent-DABench: Evaluating Agents on Data Analysis Tasks	arxiv	2024-01-10	Data Analysis	Github
<br> DB-GPT: Empowering Database Interactions with Private Large Language Models	arxiv	2023-12-29	Data Analysis	Github
ReAcTable: Enhancing ReAct for Table Question Answering	arxiv	2023-10-01	TQA
<br>SheetCopilot: Bringing Software Productivity to the Next Level through Large Language Models	NIPS 2023	2023-05-30	Manipulating Excels with LLM	Github
TableGPT: Towards Unifying Tables, Nature Language and Commands into One GPT	arxiv	2023-07-17	Manipulating CSV table with LLM

5. RAG with Tabular Data

Title	Conference	Date	Task
TableRAG: Million-Token Table Understanding with Language Models	NIPS 2024	2024-10-07	TQA for extremely long tables
Evaluation of Table Representations to Answer Questions from Tables in Documents : A Case Study using 3GPP Specifications	arxiv	2024-08-30	how to represent tables for better retrieval within RAG systems
THoRR: Complex Table Retrieval and Refinement for RAG	IR-RAG 2024 workshop		RAG with large and complex tables

6. Empirical Study or Benchmarks for Evaluating LLMs' Table Understanding Ability

Title	Conference	Date	Task	Code
Rethinking Tabular Data Understanding with Large Language Models	NAACL 2024	2023-12-27	TQA
On the Robustness of Language Models for Tabular Question Answering	arxiv	2024-06-18	TQA
FREB-TQA: A Fine-Grained Robustness Evaluation Benchmark for Table Question Answering	NAACL 2024	2024-04-29	TQA
How Robust are the Tabular QA Models for Scientific Tables? A Study using Customized Dataset	arxiv	2024-03-20	TQA
<br> InstructExcel: A Benchmark for Natural Language Instruction in Excel	Findings of EMNLP 2023	2023-10-23	Excel operations	Github
Tabular Representation, Noisy Operators, and Impacts on Table Structure Understanding Tasks in LLMs	arxiv	2023-10-16	Fact-Finding Tasks, Transformation Tasks
<br> Investigating Table-to-Text Generation Capabilities of LLMs in Real-World Information Seeking Scenarios	EMNLP 2023	2023-05-24	T2T	Github
<br> TABLET: Learning From Instructions For Tabular Data	arxiv	2023-04-25		Github
Table Meets LLM: Can Large Language Models Understand Structured Table Data? A Benchmark and Empirical Study	WSDM 2024	2023-05-22	TQA,TFV,T2T
Evaluating the Text-to-SQL Capabilities of Large Language Models	arxiv	2022-03-15	NL2SQL
<br> A comprehensive evaluation of ChatGPT's zero-shot Text-to-SQL capability	arxiv	2023-03-12	NL2SQL	Github
<br> RobuT: A Systematic Study of Table QA Robustness Against Human-Annotated Adversarial Perturbations	ACL 2023	2023-06-25	TQA	Github

7. Multimodal Table Understanding

Title	Conference	Date	Task	Code
Knowledge-Aware Reasoning over Multimodal Semi-structured Tables	EMNLP 2024 Findings	2024-08-25	Understanding table images with visual elements like symbols and icons
Leopard: A Vision Language Model For Text-Rich Multi-Image Tasks	arxiv	2024-10-02	Multi Table Image QA	Github
<br> PixT3: Pixel-based Table-To-Text Generation	ACL 2024	2023-11-16	T2T	Github
TabPedia: Towards Comprehensive Visual Table Understanding with Concept Synergy	NIPS 2024	2024-06-03	TQA,TI
<br> TableVQA-Bench: A Visual Question Answering Benchmark on Multiple Table Domains	arxiv	2024-04-30	TQA, TFV	Github
Tables as Texts or Images: Evaluating the Table Reasoning Ability of LLMs and MLLMs	ACL 2024	2024-02-19	TQA,TFV,T2T
<br> Multimodal Table Understanding	ACL 2024	2024-02-15	TQA, TFV, T2T, TI, TAT-QA, TMR	Github

8. Table Understanding Datasets

8.1 Recent Datasets for LLMs

Title	Conference	Date	Task	Data Volume	Domain	Table Type	Data and Code
MiMoTable: A Multi-scale Spreadsheet Benchmark with Meta Operations for Table Reasoning	COLING 2024	2024-12-16	TQA,T2T,Table manipulation, Data analysis	1,719 (spreadsheet, question, answer) triplets from 428 different spreadsheets	Multiple domains	Flat and hierarchical tables	Github
ENTRANT: A Large Financial Dataset for Table Understanding	Sci Data	2024-07-04	Cell Type Classification, Header Extraction, et al	Millions of tables with cell attributes, as well as positional and hierarchical information	Financial	Flat tables and hierarchical tables	Github
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering	arxiv	2024-08-17	TMR, TFV, Trend Forecasting and Chart Generation	3681 tables and 20K samples	Collect tables from academic datasets like WTQ and FeTaQA	Flat tables and a small number of hierarchical tables	Github
DocTabQA: Answering Questions from Long Documents Using Tables	arxiv	2024-08-21	Table Generation based on question and document	300 documents and 1.5k question-table pairs	Financial	Flat tables and hierarchical tables	Github

8.2 Classic Datasets of Downstream Table Tasks

9. Designing Evaluation Metrics for Table Understanding

Title	Conference	Date	Task	Code
Revisiting Automated Evaluation for Long-form Table Question Answering in the Era of Large Language Models	EMNLP 2024		TQA
Is This a Bad Table? A Closer Look at the Evaluation of Table Generation from Text	EMNLP 2024	2024-06-21	Text2Table