Awesome

Tutorial on Large Language Models for Tabular Data: Progresses and Future Directions

🌟 A tutorial on “Large Language Models for Tabular Data” at the SIGIR’24 conference in D.C.

Slides

SIGIR 2024 Tutorial on Large Language Models for Tabular Data
- [Introduction]
- [Encoding Tabular Data for LLMs]
- [Modeling and Training LLMs for Tabular Data]
- [Tasks and Benchmarks]
- [LLM-driven Table Agents]

Paper

Paper

Paper List

Introduction

Binder: Binding Language Models in Symbolic Languages [Paper]
TabLLM: Few-shot Classification of Tabular Data with Large Language Models [Paper]
Datar: Large Language Models are Versatile Decomposers: Decompose Evidence and Questions for Table-based Reasoning [Paper]
Din-sql: Decomposed in-context learning of text-to-sql with self-correction [Paper]
Table Meets LLM: Can Large Language Models Understand Structured Table Data? [Paper]
SheetCopilot: Bringing Software Productivity to the Next Level through Large Language Models [Paper]
Data-Copilot: Bridging Billions of Data and Humans with Autonomous Workflow [Paper]
DAIL-SQL: Text-to-SQL Empowered by Large Language Models: A Benchmark Evaluation [Paper]
Table-GPT: Table-tuned GPT for Diverse Table Tasks [Paper]
API-Assisted Code Generation for Question Answering on Varied Table Structures [Paper]
InsightPilot: An LLM-Empowered Automated Data Exploration System [Paper]
TableLlama: Towards Open Large Generalist Models for Tables [Paper]
DBCopilot: Scaling Natural Language Querying to Massive Databases [Paper]
TAP4LLM: Table Provider on Sampling, Augmenting, and Packing Semi-structured Data for Large Language Model Reasoning [Paper]
DB-GPT: Empowering Database Interactions with Private Large Language Models [Paper]
Chain-of-Table: Evolving Tables in the Reasoning Chain for Table Understanding [Paper]
Trove: Inducing verifiable and efficient toolboxes for solving programmatic tasks [Paper]
MatPlotAgent: Method and Evaluation for LLM-Based Agentic Scientific Data Visualization [Paper]
StructLM: Towards Building Generalist Models for Structured Knowledge Grounding [Paper]
TableLLM: Enabling Tabular Data Manipulation by LLMs in Real Office Usage Scenarios [Paper]
Vision Language Models for Spreadsheet Understanding: Challenges and Opportunities [Paper]
Table-LLaVA: Multimodal Table Understanding [Paper]
SpreadsheetBench: Towards Challenging Real World Spreadsheet Manipulation [Paper]
SpreadsheetLLM: Encoding Spreadsheets for Large Language Models [Paper]

Encoding Tabular Data for LLMs

Table Meets LLM: Can Large Language Models Understand Structured Table Data? [Paper]
Tabular Representation, Noisy Operators, and Impacts on Table Structure Understanding Tasks in LLMs [Paper]
SheetAgent: A Generalist Agent for Spreadsheet Reasoning and Manipulation via Large Language Models [Paper]
Text-to-SQL Empowered by Large Language Models: A Benchmark Evaluation [Paper]
Enhancing text-to-SQL capabilities of large language models: A study on prompt design strategies [Paper]
Automated Data Visualization from Natural Language via Large Language Models: An Exploratory Study [Paper]
TAP4LLM: Table Provider on Sampling, Augmenting, and Packing Semi-structured Data for Large Language Model Reasoning [Paper]
DBCopilot: Scaling Natural Language Querying to Massive Databases [Paper]
TabLLM: Few-shot Classification of Tabular Data with Large Language Models [Paper]
Towards foundation models for learning on tabular data [Paper]
Tables as Images? Exploring the Strengths and Limitations of LLMs on Multimodal Representations of Tabular Data [Paper]
Multimodal Table Understanding [Paper]
Vision Language Models for Spreadsheet Understanding: Challenges and Opportunities [Paper]
SpreadsheetLLM: Encoding Spreadsheets for Large Language Models [Paper]

Modeling and Training LLMs for Tabular Data

TaBERT: Pretraining for Joint Understanding of Textual and Tabular Data [Paper]
TaPas: Weakly Supervised Table Parsing via Pre-training [Paper]
TURL: table understanding through representation learning [Paper]
TUTA: Tree-based transformers for generally structured table pre-training [Paper]
TAPEX: Table Pre-Training via Learning a Neural SQL Executor [Paper]
Unifiedskg: Unifying and multi-tasking structured knowledge grounding with text-to-text language models [Paper]
Table-GPT: Table-tuned GPT for Diverse Table Tasks [Paper]
SpreadsheetBench: Towards Challenging Real World Spreadsheet Manipulation [Paper]
TableLlama: Towards Open Large Generalist Models for Tables [Paper]
Hellama: Llamabased table to text generation by highlighting the important evidence [Paper]
StructLM: Towards Building Generalist Models for Structured Knowledge Grounding [Paper]
TableLLM: Enabling Tabular Data Manipulation by LLMs in Real Office Usage Scenarios [Paper]
TAT-LLM: A Specialized Language Model for Discrete Reasoning over Tabular and Textual Data [Paper]
Text-to-SQL Empowered by Large Language Models: A Benchmark Evaluation [Paper]
DB-GPT: Empowering Database Interactions with Private Large Language Models [Paper]
Towards foundation models for learning on tabular data [Paper]
LIFT: Language-Interfaced Fine-Tuning for Non-Language Machine Learning Tasks [Paper]
Unleashing the Potential of Large Language Models for Predictive Tabular Tasks in Data Science [Paper]
Multimodal Table Understanding [Paper]
Effective distillation of table-based reasoning ability from llms [Paper]
OmChat: A Recipe to Train Multimodal Language Models with Strong Long Context and Video Understanding [Paper]
Vision Language Models for Spreadsheet Understanding: Challenges and Opportunities [Paper]
SpreadsheetLLM: Encoding Spreadsheets for Large Language Models [Paper]

Tasks and Benchmarks

Tablesense: Spreadsheet table detection with convolutional neural networks [Paper]
Auto-tables: Synthesizing multi-step transformations to relationalize tables without using examples [Paper]
Spreadsheet table transformations from examples [Paper]
TUTA: Tree-based transformers for generally structured table pre-training [Paper]
Fortap: Using formulas for numerical-reasoning-aware table pretraining [Paper]
Open domain question answering over tables via dense retrieval [Paper]
Table Retrieval May Not Necessitate Table-specific Model Design [Paper]
Compositional semantic parsing on semi-structured tables [Paper]
Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task [Paper]
HiTab: A Hierarchical Table Dataset for Question Answering and Natural Language Generation [Paper]
Dynamic prompt learning via policy gradient for semi-structured mathematical reasoning [Paper]
FeTaQA: Free-form Table Question Answering [Paper]
Tab-CQA: A Tabular Conversational Question Answering Dataset on Financial Reports [Paper]
TempTabQA: Temporal Question Answering for Semi-Structured Tables [Paper]
Open question answering over tables and text [Paper]
TAT-QA: A Question Answering Benchmark on a Hybrid of Tabular and Textual Content in Finance [Paper]
AIT-QA: Question Answering Dataset over Complex Tables in the Airline Industry [Paper]
Tabfact: A large-scale dataset for table-based fact verification [Paper]
ToTTo: A Controlled Table-To-Text Generation Dataset. [Paper] [Dataset]
Enhanced Chart Understanding via Visual Language Pre-training on Plot Table Pairs [Paper]
Matplotagent: Method and evaluation for llm-based agentic scientific data visualization [Paper]
SheetCopilot: Bringing Software Productivity to the Next Level through Large Language Models. [Paper]
Language models enable simple systems for generating structured views of heterogeneous data lakes [Paper]
Large language models (LLMs) on tabular data: Prediction, generation, and understanding-a survey [Paper]

LLM-driven Table Agents

Large language models are versatile decomposers: Decompose evidence and questions for table-based reasoning [Paper]
Exploring chain-of-thought style prompting for text-to-sql [Paper]
Chain-of-table: Evolving tables in the reasoning chain for table understanding. [Paper]
DIN-SQL: Decomposed InContext Learning of Text-to-SQL with Self-Correction. [Paper]
Tab-cot: Zero-shot tabular chain of thought [Paper]
Selective demonstrations for cross-domain text-to-SQL [Paper]
Spreadsheetcoder: Formula prediction from semi-structured context [Paper]
SheetCopilot: Bringing Software Productivity to the Next Level through Large Language Models. [Paper]
Toolqa: A dataset for llm question answering with external tools [Paper]
ReAcTable: Enhancing ReAct for Table Question Answering. [Paper]
Lever: Learning to verify language-to-code generation with execution [Paper]
MatPlotAgent: Method and Evaluation for LLM-Based Agentic Scientific Data Visualization [Paper]
Binding Language Models in Symbolic Languages. [Paper]
Chameleon: Plug-and-play compositional reasoning with large language models [Paper]
API-Assisted Code Generation for Question Answering on Varied Table Structures [Paper]
Executable code actions elicit better llm agents [Paper]
Data-Copilot: Bridging Billions of Data and Humans with Autonomous Workflow. [Paper]
ToolWriter: Question Specific Tool Synthesis for Tabular Data [Paper]
CRAFT: Customizing llms by creating and retrieving from specialized toolsets [Paper]
Trove: Inducing verifiable and efficient toolboxes for solving programmatic tasks [Paper]
Cognitive architectures for language agent [Paper]
BAGEL: Bootstrapping Agents by Guiding Exploration with Language [Paper]
Ehragent: Code empowers large language models for few-shot complex tabular reasoning on electronic health records [Paper]
Towards knowledge-intensive text-to-SQL semantic parsing with formulaic knowledge [Paper]