Awesome

Large Language Models for Data Annotation: A Survey

Stars

This is a curated list of papers about LLM for Annotation

maintained by Zhen Tan (ztan36@asu.edu) and Alimohammad Beigi (abeigi@asu.edu).
If you want to add new entries, please make PRs with the same format.
This list serves as a complement to the survey below.

[Large Language Models for Data Annotation: A Survey]

If you find this repo helpful, we would appreciate it if you could cite our survey.

@misc{tan2024large,
      title={Large Language Models for Data Annotation: A Survey}, 
      author={Zhen Tan and Alimohammad Beigi and Song Wang and Ruocheng Guo and Amrita Bhattacharjee and Bohan Jiang and Mansooreh Karami and Jundong Li and Lu Cheng and Huan Liu},
      year={2024},
      eprint={2402.13446},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

LLM-Based Data Annotation

Manually Engineered Prompt

[EACL 2024] GPTs Are Multilingual Annotators for Sequence Generation Tasks. [pdf] [code]
[arXiv 2023] AnnoLLM: Making Large Language Models to Be Better Crowdsourced Annotators. [pdf]
[arXiv 2023] RAFT: Reward Ranked FineTuning for Generative Foundation Model Alignment. [pdf]
[arXiv 2023] Small Models are Valuable Plug-ins for Large Language Models. [pdf] [code]
[arXiv 2022] Language Models in the Loop: Incorporating Prompting into Weak Supervision. [pdf]
[EMNLP 2022] ZeroGen: Efficient Zero-shot Learning via Dataset Generation. [pdf] [code]
[NAACL-HLT 2022] Learning To Retrieve Prompts for In-Context Learning. [pdf] [code]
[EMNLP 2021] Constrained Language Models Yield Few-Shot Semantic Parsers. [pdf] [code]

Alignment via Pairwise Feedback

[ACL 2023] Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers. [pdf] [code]
[arXiv 2023] Direct Preference Optimization: Your Language Model is Secretly a Reward Model. [pdf]
[NeurIPS 2022] Fine-tuning language models to find agreement among humans with diverse preferences. [pdf]
[arXiv 2022] Improving alignment of dialogue agents via targeted human judgements. [pdf]
[arXiv 2022] Teaching language models to support answers with verified quotes. [pdf] [data]
[NeurIPS 2020] Learning to summarize with human feedback. [pdf] [code]
[arXiv 2019] Fine-Tuning Language Models from Human Preferences. [pdf] [code]

Assessing LLM-Generated Annotations

Evaluating LLM-Generated Annotations

[EACL 2024] GPTs Are Multilingual Annotators for Sequence Generation Tasks. [pdf] [code]
[arXiv 2023] AnnoLLM: Making Large Language Models to Be Better Crowdsourced Annotators. [pdf]
[arXiv 2023] Open-Source Large Language Models Outperform Crowd Workers and Approach ChatGPT in Text-Annotation Tasks. [pdf]
[NAACL 2022] LMTurk: Few-Shot Learners as Crowdsourcing Workers in a Language-Model-as-a-Service Framework. [pdf] [code]
[EMNLP 2022] Large Language Models are Few-Shot Clinical Information Extractors. [pdf] [data]
[arXiv 2022] Unnatural Instructions: Tuning Language Models with (Almost) No Human Labor. [pdf] [code]
[arXiv 2020] The Turking Test: Can Language Models Understand Instructions? [pdf]

Data Selection via Active Learning

[EMNLP 2023] FreeAL: Towards Human-Free Active Learning in the Era of Large Language Models [pdf] [code]
[EMNLP 2023] Active Learning Principles for In-Context Learning with Large Language Models. [pdf]
[IUI 2023] ScatterShot: Interactive In-context Example Curation for Text Transformation. [pdf] [code]
[ICML 2023] Prefer to Classify: Improving Text Classifiers via Auxiliary Preference Learning. [pdf] [code]
[arXiv 2023] Large Language Models as Annotators: Enhancing Generalization of NLP Models at Minimal Cost. [pdf]
[arXiv 2022] Active learning helps pretrained models learn the intended task. [pdf] [code]
[EACL 2021] Active Learning for Sequence Tagging with Deep Pre-trained Models and Bayesian Uncertainty Estimates. [pdf]

Learning with LLM-Generated Annotations

Target Domain Inference: Direct Utilization of Annotations

[ECIR 2024] Large Language Models are Zero-Shot Rankers for Recommender Systems. [pdf] [code]
[arXiv 2023] Causal Reasoning and Large Language Models: Opening a New Frontier for Causality. [pdf]
[ACL 2022] An Information-theoretic Approach to Prompt Engineering Without Ground Truth Labels. [pdf] [code]
[TMLR 2022] Emergent Abilities of Large Language Models. [pdf]
[NeurIPS 2022] Large Language Models are Zero-Shot Reasoners. [pdf]
[arXiv 2022] Visual Classification via Description from Large Language Models. [pdf]
[PMLR 2021] Learning Transferable Visual Models From Natural Language Supervision. [pdf] [code]
[EMNLP 2019] Language Models as Knowledge Bases? [pdf] [code]

Knowledge Distillation: Bridging LLM and task-specific models

[EACL 2024] GPTs Are Multilingual Annotators for Sequence Generation Tasks. [pdf] [code]
[EMNLP 2023] Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking Agents. [pdf] [code]
[ACL 2023] Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes. [pdf] [code]
[ACL 2023] GPT4All: Training an Assistant-style Chatbot with Large Scale Data Distillation from GPT-3.5-Turbo. [pdf] [code]
[ACL 2023] GKD: A General Knowledge Distillation Framework for Large-scale Pre-trained Language Model. [pdf] [code]
[EMNLP 2023] Lion: Adversarial Distillation of Proprietary Large Language Models. [pdf] [code]
[arXiv 2023] Specializing Smaller Language Models towards Multi-Step Reasoning. [pdf]
[arXiv 2023] Knowledge Distillation of Large Language Models. [pdf] [code]
[arXiv 2023] Distilling Large Language Models for Biomedical Knowledge Extraction: A Case Study on Adverse Drug Events. [pdf]
[arXiv 2023] Web Content Filtering through knowledge distillation of Large Language Models. [pdf]
[ICLR 2022] Knowledge Distillation of Large Language Models. [pdf] [code]
[arXiv 2022] Teaching Small Language Models to Reason. [pdf]

Harnessing LLM Annotations for Fine-Tuning and Prompting

In-Context Learning (ICL)

[EMNLP 2023] Active Learning Principles for In-Context Learning with Large Language Models. [pdf]
[ACL 2023] Synthetic Prompting: Generating Chain-of-Thought Demonstrations for Large Language Models. [pdf]
[ICLR 2022] Finetuned Language Models Are Zero-Shot Learners. [pdf] [code]
[ICLR 2022] Selective Annotation Makes Language Models Better Few-Shot Learners. [pdf] [code]
[NAACL 2022] Improving In-Context Few-Shot Learning via Self-Supervised Training. [pdf]
[arXiv 2022] Instruction Induction: From Few Examples to Natural Language Task Descriptions. [pdf] [code]
[NeurIPS 2020] Language Models are Few-Shot Learners. [pdf]

Chain-of-Thought Prompting (CoT)

[ICLR 2023] Automatic chain of thought prompting in large language models. [pdf] [code]
[ACL 2023] SCOTT: Self-Consistent Chain-of-Thought Distillation. [pdf]
[arXiv 2023] Specializing Smaller Language Models towards Multi-Step Reasoning. [pdf]
[NeurIPS 2022] Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. [pdf]
[NeurIPS 2022] Large Language Models are Zero-Shot Reasoners. [pdf]
[arXiv 2022] Rationale-augmented ensembles in language models. [pdf]
[ACL 2020] A Diverse Corpus for Evaluating and Developing English Math Word Problem Solvers. [pdf] [code]
[NAACL 2019] CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge. [pdf] [code]

Instruction Tuning (IT)

[ACL 2023] Crosslingual Generalization through Multitask Finetuning. [pdf] [code]
[ACL 2023] SELF-INSTRUCT: Aligning Language Models with Self-Generated Instructions. [pdf] [code]
[ACL 2023] Can Large Language Models Be an Alternative to Human Evaluations? [pdf]
[arXiv 2023] LLaMA: Open and Efficient Foundation Language Models. [pdf][code]
[arXiv 2022] Teaching language models to support answers with verified quotes. [pdf] [data]
[arXiv 2022] Scaling instruction-finetuned language models. [pdf] [code]
[EMNLP 2022] Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks. [pdf] [code]
[NeurIPS 2020] Language Models are Few-Shot Learners. [pdf]
Stanford alpaca: An instruction-following llama model. [HTML] [code]

Alignment Tuning (AT)

[PMLR 2023] Pretraining Language Models with Human Preferences. [pdf][code]
[ICLR 2023] Offline RL for Natural Language Generation with Implicit Language Q Learning. [pdf] [code]
[arXiv 2023] Chain of hindsight aligns language models with feedback. [pdf][code]
[arXiv 2023] GPT-4 Technical Report. [pdf]
[arXiv 2023] Llama 2: Open Foundation and Fine-Tuned Chat Models. [pdf] [code]
[arXiv 2023] RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback. [pdf]
[NeurIPS 2022] Training language models to follow instructions with human feedback. [pdf]
[arXiv 2022] Teaching language models to support answers with verified quotes. [pdf] [data]
[arXiv 2019] Fine-Tuning Language Models from Human Preferences. [pdf][code]
[arXiv 2019] CTRL: A Conditional Transformer Language Model for Controllable Generation. [pdf][code]
[NeurIPS 2017] Deep Reinforcement Learning from Human Preferences. [pdf]

Surveys

[ACM 2023] Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing. [pdf]
[arXiv 2023] A Systematic Survey of Prompt Engineering on Vision-Language Foundation Models. [pdf] [repo]
[arXiv 2022] A Survey of Large Language Models. [pdf] [repo]
[arXiv 2022] A Survey on In-context Learning. [pdf]
[arXiv 2022] A Comprehensive Survey on Instruction Following. [pdf] [repo]

Toolkits

LangChain: [HTML] [code]
Stack AI: [HTML]
UBIAI: [HTML]
Prodigy: [HTML]
Alfred: [pdf] [code]