Awesome
Tutorial on Large Language Models for Tabular Data: Progresses and Future Directions
🌟 A tutorial on “Large Language Models for Tabular Data” at the SIGIR’24 conference in D.C.
Slides
- SIGIR 2024 Tutorial on Large Language Models for Tabular Data
-
[Introduction]
-
[Encoding Tabular Data for LLMs]
-
[Modeling and Training LLMs for Tabular Data]
-
[Tasks and Benchmarks]
-
[LLM-driven Table Agents]
-
Paper
Paper List
Introduction
- Binder: Binding Language Models in Symbolic Languages [Paper]
- TabLLM: Few-shot Classification of Tabular Data with Large Language Models [Paper]
- Datar: Large Language Models are Versatile Decomposers: Decompose Evidence and Questions for Table-based Reasoning [Paper]
- Din-sql: Decomposed in-context learning of text-to-sql with self-correction [Paper]
- Table Meets LLM: Can Large Language Models Understand Structured Table Data? [Paper]
- SheetCopilot: Bringing Software Productivity to the Next Level through Large Language Models [Paper]
- Data-Copilot: Bridging Billions of Data and Humans with Autonomous Workflow [Paper]
- DAIL-SQL: Text-to-SQL Empowered by Large Language Models: A Benchmark Evaluation [Paper]
- Table-GPT: Table-tuned GPT for Diverse Table Tasks [Paper]
- API-Assisted Code Generation for Question Answering on Varied Table Structures [Paper]
- InsightPilot: An LLM-Empowered Automated Data Exploration System [Paper]
- TableLlama: Towards Open Large Generalist Models for Tables [Paper]
- DBCopilot: Scaling Natural Language Querying to Massive Databases [Paper]
- TAP4LLM: Table Provider on Sampling, Augmenting, and Packing Semi-structured Data for Large Language Model Reasoning [Paper]
- DB-GPT: Empowering Database Interactions with Private Large Language Models [Paper]
- Chain-of-Table: Evolving Tables in the Reasoning Chain for Table Understanding [Paper]
- Trove: Inducing verifiable and efficient toolboxes for solving programmatic tasks [Paper]
- MatPlotAgent: Method and Evaluation for LLM-Based Agentic Scientific Data Visualization [Paper]
- StructLM: Towards Building Generalist Models for Structured Knowledge Grounding [Paper]
- TableLLM: Enabling Tabular Data Manipulation by LLMs in Real Office Usage Scenarios [Paper]
- Vision Language Models for Spreadsheet Understanding: Challenges and Opportunities [Paper]
- Table-LLaVA: Multimodal Table Understanding [Paper]
- SpreadsheetBench: Towards Challenging Real World Spreadsheet Manipulation [Paper]
- SpreadsheetLLM: Encoding Spreadsheets for Large Language Models [Paper]
Encoding Tabular Data for LLMs
- Table Meets LLM: Can Large Language Models Understand Structured Table Data? [Paper]
- Tabular Representation, Noisy Operators, and Impacts on Table Structure Understanding Tasks in LLMs [Paper]
- SheetAgent: A Generalist Agent for Spreadsheet Reasoning and Manipulation via Large Language Models [Paper]
- Text-to-SQL Empowered by Large Language Models: A Benchmark Evaluation [Paper]
- Enhancing text-to-SQL capabilities of large language models: A study on prompt design strategies [Paper]
- Automated Data Visualization from Natural Language via Large Language Models: An Exploratory Study [Paper]
- TAP4LLM: Table Provider on Sampling, Augmenting, and Packing Semi-structured Data for Large Language Model Reasoning [Paper]
- DBCopilot: Scaling Natural Language Querying to Massive Databases [Paper]
- TabLLM: Few-shot Classification of Tabular Data with Large Language Models [Paper]
- Towards foundation models for learning on tabular data [Paper]
- Tables as Images? Exploring the Strengths and Limitations of LLMs on Multimodal Representations of Tabular Data [Paper]
- Multimodal Table Understanding [Paper]
- Vision Language Models for Spreadsheet Understanding: Challenges and Opportunities [Paper]
- SpreadsheetLLM: Encoding Spreadsheets for Large Language Models [Paper]
Modeling and Training LLMs for Tabular Data
- TaBERT: Pretraining for Joint Understanding of Textual and Tabular Data [Paper]
- TaPas: Weakly Supervised Table Parsing via Pre-training [Paper]
- TURL: table understanding through representation learning [Paper]
- TUTA: Tree-based transformers for generally structured table pre-training [Paper]
- TAPEX: Table Pre-Training via Learning a Neural SQL Executor [Paper]
- Unifiedskg: Unifying and multi-tasking structured knowledge grounding with text-to-text language models [Paper]
- Table-GPT: Table-tuned GPT for Diverse Table Tasks [Paper]
- SpreadsheetBench: Towards Challenging Real World Spreadsheet Manipulation [Paper]
- TableLlama: Towards Open Large Generalist Models for Tables [Paper]
- Hellama: Llamabased table to text generation by highlighting the important evidence [Paper]
- StructLM: Towards Building Generalist Models for Structured Knowledge Grounding [Paper]
- TableLLM: Enabling Tabular Data Manipulation by LLMs in Real Office Usage Scenarios [Paper]
- TAT-LLM: A Specialized Language Model for Discrete Reasoning over Tabular and Textual Data [Paper]
- Text-to-SQL Empowered by Large Language Models: A Benchmark Evaluation [Paper]
- DB-GPT: Empowering Database Interactions with Private Large Language Models [Paper]
- Towards foundation models for learning on tabular data [Paper]
- LIFT: Language-Interfaced Fine-Tuning for Non-Language Machine Learning Tasks [Paper]
- Unleashing the Potential of Large Language Models for Predictive Tabular Tasks in Data Science [Paper]
- Multimodal Table Understanding [Paper]
- Effective distillation of table-based reasoning ability from llms [Paper]
- OmChat: A Recipe to Train Multimodal Language Models with Strong Long Context and Video Understanding [Paper]
- Vision Language Models for Spreadsheet Understanding: Challenges and Opportunities [Paper]
- SpreadsheetLLM: Encoding Spreadsheets for Large Language Models [Paper]
Tasks and Benchmarks
-
Tablesense: Spreadsheet table detection with convolutional neural networks [Paper]
-
Auto-tables: Synthesizing multi-step transformations to relationalize tables without using examples [Paper]
-
Spreadsheet table transformations from examples [Paper]
-
TUTA: Tree-based transformers for generally structured table pre-training [Paper]
-
Fortap: Using formulas for numerical-reasoning-aware table pretraining [Paper]
-
Open domain question answering over tables via dense retrieval [Paper]
-
Table Retrieval May Not Necessitate Table-specific Model Design [Paper]
-
Compositional semantic parsing on semi-structured tables [Paper]
-
Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task [Paper]
-
HiTab: A Hierarchical Table Dataset for Question Answering and Natural Language Generation [Paper]
-
Dynamic prompt learning via policy gradient for semi-structured mathematical reasoning [Paper]
-
FeTaQA: Free-form Table Question Answering [Paper]
-
Tab-CQA: A Tabular Conversational Question Answering Dataset on Financial Reports [Paper]
-
TempTabQA: Temporal Question Answering for Semi-Structured Tables [Paper]
-
Open question answering over tables and text [Paper]
-
TAT-QA: A Question Answering Benchmark on a Hybrid of Tabular and Textual Content in Finance [Paper]
-
AIT-QA: Question Answering Dataset over Complex Tables in the Airline Industry [Paper]
-
Tabfact: A large-scale dataset for table-based fact verification [Paper]
-
ToTTo: A Controlled Table-To-Text Generation Dataset. [Paper] [Dataset]
-
Enhanced Chart Understanding via Visual Language Pre-training on Plot Table Pairs [Paper]
-
Matplotagent: Method and evaluation for llm-based agentic scientific data visualization [Paper]
-
SheetCopilot: Bringing Software Productivity to the Next Level through Large Language Models. [Paper]
-
Language models enable simple systems for generating structured views of heterogeneous data lakes [Paper]
-
Large language models (LLMs) on tabular data: Prediction, generation, and understanding-a survey [Paper]
LLM-driven Table Agents
- Large language models are versatile decomposers: Decompose evidence and questions for table-based reasoning [Paper]
- Exploring chain-of-thought style prompting for text-to-sql [Paper]
- Chain-of-table: Evolving tables in the reasoning chain for table understanding. [Paper]
- DIN-SQL: Decomposed InContext Learning of Text-to-SQL with Self-Correction. [Paper]
- Tab-cot: Zero-shot tabular chain of thought [Paper]
- Selective demonstrations for cross-domain text-to-SQL [Paper]
- Spreadsheetcoder: Formula prediction from semi-structured context [Paper]
- SheetCopilot: Bringing Software Productivity to the Next Level through Large Language Models. [Paper]
- Toolqa: A dataset for llm question answering with external tools [Paper]
- ReAcTable: Enhancing ReAct for Table Question Answering. [Paper]
- Lever: Learning to verify language-to-code generation with execution [Paper]
- MatPlotAgent: Method and Evaluation for LLM-Based Agentic Scientific Data Visualization [Paper]
- Binding Language Models in Symbolic Languages. [Paper]
- Chameleon: Plug-and-play compositional reasoning with large language models [Paper]
- API-Assisted Code Generation for Question Answering on Varied Table Structures [Paper]
- Executable code actions elicit better llm agents [Paper]
- Data-Copilot: Bridging Billions of Data and Humans with Autonomous Workflow. [Paper]
- ToolWriter: Question Specific Tool Synthesis for Tabular Data [Paper]
- CRAFT: Customizing llms by creating and retrieving from specialized toolsets [Paper]
- Trove: Inducing verifiable and efficient toolboxes for solving programmatic tasks [Paper]
- Cognitive architectures for language agent [Paper]
- BAGEL: Bootstrapping Agents by Guiding Exploration with Language [Paper]
- Ehragent: Code empowers large language models for few-shot complex tabular reasoning on electronic health records [Paper]
- Towards knowledge-intensive text-to-SQL semantic parsing with formulaic knowledge [Paper]