Awesome

Language Model as a Service (LMaaS)

This is a curated list of "Language-Model-as-a-Service (LMaaS)" papers, which is mainly maintained by Tianxiang Sun. We strongly encourage the NLP researchers who are interested in this topic to make pull request to add or update the papers (See Contributing). Watch this repository for the latest updates!

Updates

2022/7/7: Write a blog (in Chinese)
2022/7/4: Create this paper list

Introduction
- Scope
- Advantages
Keywords
Papers
Contributing

Introduction

Due to commercial reasons and expensive tuning cost, pre-trained large language models (LLMs) such as GPT-3 are usually released as a service instead of open sourcing model weights. We call this scenario "Language-Model-as-a-Service (LMaaS)" (the term is originally used in our ICML'2022 paper). In such a scenario, users can access the powerful LLMs through their inference APIs. The service of LLMs has powered many use cases (See GPT-3 Demo). In contrast to fine-tuning, LMaaS allows a single general purpose LLM to serve many difference tasks and therefore is highly deployment-efficient. Nevertheless, how to adapt LLMs to target tasks without access to their parameters and gradients is a challenge. To make LLMs benefit a wider audience, we collect papers that fit into this scenario to facilitate future research.

Scope

Which papers fit into the scenario of LMaaS? We mainly consider papers that adapt LLMs to downstream tasks without accessing the model parameters and the gradients. Though fine-tuned LLMs can also be services after deployment, they are limited to solve a single task for limited audience. In our scope, we prefer serving general purpose models for a variety of users.

In existing literature, there are several lines of research that fit into LMaaS:

Text prompt. By manually or automatically designing task-specific text prompts, users can solve the target task of interest by conditioning frozen LLMs.
In-context learning. Users can provide a few examples in the input at inference time to help LLMs to rapidly adapt to the target task.
Black-box optimization. By tuning a small portion of parameters (e.g., continuous prompt) with only the access of the LLM's output probability via black-box optimization, users can solve target tasks with a small training set.
Feature-based learning. LLMs can serve as a feature extractor, on which users can build some learnable task-specific modules to perform classification or generation.
Data Generation. Generative LLMs can be used to generate a dataset of labeled text pairs from scratch, which is then used to locally train a much smaller model.

The boundary between text prompt and in-context learning is a bit blurred. In this repo, the text prompt category contains papers that do not use labeled samples, while the in-context learning category is comprised of papers that include labeled samples in the prompts.

Note: A related (and partially overlapped) topic is prompt-based learning, which aims to solve downstream tasks using general purpose LLMs by converting input and output with some template and verbalizer, respectively. However, most works in prompt-based learning require the access to model parameters and gradients, and therefore do not fit into our scope. For prompt-based learning papers that are not suitable for LMaaS, we recommend contributing to another awesome paper list: PromptPaper.

Advantages

Compared with fine-tuning task-specific LLMs, LMaaS has the following advantages:

Deployment-efficient. LMaaS deploys a single general purpose LLM to serve various tasks. The target task can be performed conditioning the LLM with task-specific prompts, a small portion of parameters, or features. There is no need to maintain a copy of the entire model for each task.
Tuning-efficient. When there is a small number of task-specific parameters to be tuned (e.g., black-box optimization), the optimization can be highly efficient since it does not require backpropagation, where the computation complexity is proportional to the model size and therefore can be expensive or even infeasible for LLMs. By contrast, the optimization complexity in LMaaS is independent of the model size.
Sample-efficient. It has been demonstrated that LLMs can achieve competitive performance on a broad range of tasks with limited or even zero labeled data. Most works in LMaaS also focus on few-shot or zero-shot settings.

Keywords

The abbreviation of the work.

The key feature of the work.

The main experimental setting of the work.

Papers

Text Prompt

Language Models as Knowledge Bases? EMNLP 2019

Fabio Petroni, Tim Rocktäschel, Patrick Lewis, Anton Bakhtin, Yuxiang Wu, Alexander H. Miller, Sebastian Riedel. [pdf] [code]
How Can We Know What Language Models Know? TACL 2020

Zhengbao Jiang, Frank F. Xu, Jun Araki, Graham Neubig. [pdf] [code]
Language Models are Few-Shot Learners. NeurIPS 2020

Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, Dario Amodei. [pdf]
Adapting Language Models for Zero-shot Learning by Meta-tuning on Dataset and Prompt Collections. Findings of EMNLP 2021

Ruiqi Zhong, Kristy Lee, Zheng Zhang, Dan Klein. [pdf] [code]
Finetuned Language Models Are Zero-Shot Learners. ICLR 2022

Jason Wei, Maarten Bosma, Vincent Y. Zhao, Kelvin Guu, Adams Wei Yu, Brian Lester, Nan Du, Andrew M. Dai, Quoc V. Le. [pdf] [code]
Multitask Prompted Training Enables Zero-Shot Task Generalization. ICLR 2022

Victor Sanh, Albert Webson, Colin Raffel, Stephen H. Bach, Lintang Sutawika, Zaid Alyafeai, Antoine Chaffin, Arnaud Stiegler, Teven Le Scao, Arun Raja, Manan Dey, M Saiful Bari, Canwen Xu, Urmish Thakker, Shanya Sharma Sharma, Eliza Szczechla, Taewoon Kim, Gunjan Chhablani, Nihal Nayak, Debajyoti Datta, Jonathan Chang, Mike Tian-Jian Jiang, Han Wang, Matteo Manica, Sheng Shen, Zheng Xin Yong, Harshit Pandey, Rachel Bawden, Thomas Wang, Trishala Neeraj, Jos Rozen, Abheesht Sharma, Andrea Santilli, Thibault Fevry, Jason Alan Fries, Ryan Teehan, Tali Bers, Stella Biderman, Leo Gao, Thomas Wolf, Alexander M. Rush. [pdf] [code]
Training language models to follow instructions with human feedback. Preprint 2022.3

Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul Christiano, Jan Leike, Ryan Lowe. [pdf] [code]
Large Language Models are Zero-Shot Reasoners. Preprint 2022.6

Takeshi Kojima, Shixiang Shane Gu, Machel Reid, Yutaka Matsuo, Yusuke Iwasawa. [pdf] [code]
Zero-Shot Video Question Answering via Frozen Bidirectional Language Models. Preprint 2022.6

Antoine Yang, Antoine Miech, Josef Sivic, Ivan Laptev, Cordelia Schmid. [pdf] [code]
Language Models are General-Purpose Interfaces. Preprint 2022.6

Yaru Hao, Haoyu Song, Li Dong, Shaohan Huang, Zewen Chi, Wenhui Wang, Shuming Ma, Furu Wei. [pdf] [code]
Repository-Level Prompt Generation for Large Language Models of Code. Preprint 2022.6

Disha Shrivastava, Hugo Larochelle, Daniel Tarlow [pdf] [code], 2022.6
Ignore Previous Prompt: Attack Techniques For Language Models. Best Paper Award @ NeurIPS ML Safety Workshop 2022.

Fábio Perez, Ian Ribeiro [pdf] [project], 2022.11

In-Context Learning

Language Models are Few-Shot Learners. NeurIPS 2020

Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, Dario Amodei. [pdf]
Calibrate Before Use: Improving Few-Shot Performance of Language Models. ICML 2021

Tony Z. Zhao, Eric Wallace, Shi Feng, Dan Klein, Sameer Singh. [pdf] [code]
An Explanation of In-context Learning as Implicit Bayesian Inference. ICLR 2022

Sang Michael Xie, Aditi Raghunathan, Percy Liang, Tengyu Ma. [pdf] [code]
Chain of Thought Prompting Elicits Reasoning in Large Language Models. Preprint 2022.1

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc Le, Denny Zhou. [pdf]
Cross-Task Generalization via Natural Language Crowdsourcing Instructions. ACL 2022

Swaroop Mishra, Daniel Khashabi, Chitta Baral, Hannaneh Hajishirzi. [pdf] [code]
Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity. ACL 2022

Yao Lu, Max Bartolo, Alastair Moore, Sebastian Riede, Pontus Stenetorp. [pdf]
Noisy Channel Language Model Prompting for Few-Shot Text Classification. ACL 2022

Sewon Min, Mike Lewis, Hannaneh Hajishirzi, Luke Zettlemoyer. [pdf] [code]
Meta-learning via Language Model In-context Tuning. ACL 2022

Yanda Chen, Ruiqi Zhong, Sheng Zha, George Karypis, He He. [pdf] [code]
What Makes Good In-Context Examples for GPT-3? DeeLIO@ACL 2022

Jiachang Liu, Dinghan Shen, Yizhe Zhang, Bill Dolan, Lawrence Carin, Weizhu Chen. [pdf]
Learning To Retrieve Prompts for In-Context Learning. NAACL 2022

Ohad Rubin, Jonathan Herzig, Jonathan Berant. [pdf] [code]
MetaICL: Learning to Learn In Context. NAACL 2022

Sewon Min, Mike Lewis, Luke Zettlemoyer, Hannaneh Hajishirzi. [pdf] [code]
Improving In-Context Few-Shot Learning via Self-Supervised Training. NAACL 2022

Mingda Chen, Jingfei Du, Ramakanth Pasunuru, Todor Mihaylov, Srini Iyer, Veselin Stoyanov, Zornitsa Kozareva. [pdf]
Self-Generated In-Context Learning: Leveraging Auto-regressive Language Models as a Demonstration Generator. LPLM@NAACL 2022

Hyuhng Joon Kim, Hyunsoo Cho, Junyeob Kim, Taeuk Kim, Kang Min Yoo, Sang-goo Lee. [pdf]
Rethinking the Role of Demonstrations: What Makes In-Context Learning Work? Preprint 2022.2

Sewon Min, Xinxi Lyu, Ari Holtzman, Mikel Artetxe, Mike Lewis, Hannaneh Hajishirzi, Luke Zettlemoyer. [pdf] [code]
In-Context Learning for Few-Shot Dialogue State Tracking. Preprint 2022.3

Yushi Hu, Chia-Hsuan Lee, Tianbao Xie, Tao Yu, Noah A. Smith, Mari Ostendorf. [pdf] [code]
Self-Consistency Improves Chain of Thought Reasoning in Language Models. Preprint 2022.3

Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, Denny Zhou. [pdf]
STaR: Self-Taught Reasoner Bootstrapping Reasoning With Reasoning. Preprint 2022.3

Eric Zelikman, Yuhuai Wu, Noah D. Goodman. [pdf]
Benchmarking Generalization via In-Context Instructions on 1,600+ Language Tasks. Preprint 2022.4

Yizhong Wang, Swaroop Mishra, Pegah Alipoormolabashi, Yeganeh Kordi, Amirreza Mirzaei, Anjana Arunkumar, Arjun Ashok, Arut Selvan Dhanasekaran, Atharva Naik, David Stap, Eshaan Pathak, Giannis Karamanolakis, Haizhi Gary Lai, Ishan Purohit, Ishani Mondal, Jacob Anderson, Kirby Kuznia, Krima Doshi, Maitreya Patel, Kuntal Kumar Pal, Mehrad Moradshahi, Mihir Parmar, Mirali Purohit, Neeraj Varshney, Phani Rohitha Kaza, Pulkit Verma, Ravsehaj Singh Puri, Rushang Karia, Shailaja Keyur Sampat, Savan Doshi, Siddhartha Mishra, Sujan Reddy, Sumanta Patro, Tanay Dixit, Xudong Shen, Chitta Baral, Yejin Choi, Noah A. Smith, Hannaneh Hajishirzi, Daniel Khashabi. [pdf] [code]
Can language models learn from explanations in context?. Preprint 2022.4

Andrew K. Lampinen, Ishita Dasgupta, Stephanie C. Y. Chan, Kory Matthewson, Michael Henry Tessler, Antonia Creswell, James L. McClelland, Jane X. Wang, Felix Hill. [pdf]
Ground-Truth Labels Matter: A Deeper Look into Input-Label Demonstrations. Preprint 2022.5

Junyeob Kim, Hyuhng Joon Kim, Hyunsoo Cho, Hwiyeol Jo, Sang-Woo Lee, Sang-goo Lee, Kang Min Yoo, Taeuk Kim. [pdf]
The Unreliability of Explanations in Few-Shot In-Context Learning. Preprint 2022.5

Xi Ye, Greg Durrett. [pdf]
Least-to-Most Prompting Enables Complex Reasoning in Large Language Models. Preprint 2022.5

Denny Zhou, Nathanael Schärli, Le Hou, Jason Wei, Nathan Scales, Xuezhi Wang, Dale Schuurmans, Olivier Bousquet, Quoc Le, Ed Chi. [pdf]
Maieutic Prompting: Logically Consistent Reasoning with Recursive Explanations. Preprint 2022.5

Jaehun Jung, Lianhui Qin, Sean Welleck, Faeze Brahman, Chandra Bhagavatula, Ronan Le Bras, Yejin Choi. [pdf]
On the Advance of Making Language Models Better Reasoners. Preprint 2022.6

Yifei Li, Zeqi Lin, Shizhuo Zhang, Qiang Fu, Bei Chen, Jian-Guang Lou, Weizhu Chen. [pdf] [code]
Emergent Abilities of Large Language Models. Preprint 2022.6

Jason Wei, Yi Tay, Rishi Bommasani, Colin Raffel, Barret Zoph, Sebastian Borgeaud, Dani Yogatama, Maarten Bosma, Denny Zhou, Donald Metzler, Ed H. Chi, Tatsunori Hashimoto, Oriol Vinyals, Percy Liang, Jeff Dean, William Fedus. [pdf]
Language Models are General-Purpose Interfaces. Preprint 2022.6

Yaru Hao, Haoyu Song, Li Dong, Shaohan Huang, Zewen Chi, Wenhui Wang, Shuming Ma, Furu Wei. [pdf] [code]
What Can Transformers Learn In-Context? A Case Study of Simple Function Classes. Preprint 2022.8

Shivam Garg, Dimitris Tsipras, Percy Liang, Gregory Valiant. [pdf] [code]
Learning by Distilling Context. Preprint 2022.9

Charlie Snell, Dan Klein, Ruiqi Zhong. [pdf]
Binding Language Models in Symbolic Languages. Preprint 2022.10

Zhoujun Cheng*, Tianbao Xie*, Peng Shi, Chengzu Li, Rahul Nadkarni, Yushi Hu, Caiming Xiong, Dragomir Radev, Mari Ostendorf, Luke Zettlemoyer, Noah A. Smith, Tao Yu. [pdf] [code] [website]
Preserving In-Context Learning Ability in Large Language Model Fine-tuning. Preprint 2022.11

Yihan Wang, Si Si, Daliang Li, Michal Lukasik, Felix Yu, Cho-Jui Hsieh, Inderjit S Dhillon, Sanjiv Kumar. [pdf]
Teaching Algorithmic Reasoning via In-context Learning. Preprint 2022.11

Hattie Zhou, Azade Nova, Hugo Larochelle, Aaron Courville, Behnam Neyshabur, Hanie Sedghi. [pdf]
What learning algorithm is in-context learning? Investigations with linear models. Preprint 2022.11

Ekin Akyürek, Dale Schuurmans, Jacob Andreas, Tengyu Ma, Denny Zhou. [pdf]

Black-Box Optimization

Black-Box Tuning for Language-Model-as-a-Service. ICML 2022

Tianxiang Sun, Yunfan Shao, Hong Qian, Xuanjing Huang, Xipeng Qiu. [pdf] [code]
Black-box Prompt Learning for Pre-trained Language Models. TMLR 2023.2

Shizhe Diao, Xuechun Li, Yong Lin, Zhichao Huang, Tong Zhang. [pdf]
GrIPS: Gradient-free, Edit-based Instruction Search for Prompting Large Language Models. Preprint 2022.3

Archiki Prasad, Peter Hase, Xiang Zhou, Mohit Bansal. [pdf] [code]
Few-shot Prompting Towards Controllable Response Generation. Preprint 2022.6

Hsuan Su, Pohan Chi, Shih-Cheng Huang, Chung Ho Lam, Saurav Sahay, Shang-Tse Chen, Hung-yi Lee. [pdf]
BBTv2: Towards a Gradient-Free Future with Large Language Models. EMNLP 2022

Tianxiang Sun, Zhengfu He, Hong Qian, Yunhua Zhou, Xuanjing Huang, Xipeng Qiu. [pdf] [code]
RLPrompt: Optimizing Discrete Text Prompts With Reinforcement Learning. EMNLP 2022

Mingkai Deng, Jianyu Wang, Cheng-Ping Hsieh, Yihan Wang, Han Guo, Tianmin Shu, Meng Song, Eric P. Xing, Zhiting Hu. [pdf] [code]
Clip-Tuning: Towards Derivative-free Prompt Learning with a Mixture of Rewards. Findings of EMNLP 2022

Yekun Chai, Shuohuan Wang, Yu Sun, Hao Tian, Hua Wu, Haifeng Wang. [pdf]
TEMPERA: Test-Time Prompt Editing via Reinforcement Learning. Preprint 2022.11

Tianjun Zhang, Xuezhi Wang, Denny Zhou, Dale Schuurmans, Joseph E. Gonzalez. [pdf] [code]
PromptBoosting: Black-Box Text Classification with Ten Forward Passes. Preprint 2022.12

Bairu Hou, Joe O'Connor, Jacob Andreas, Shiyu Chang, Yang Zhang. [pdf]
Multitask Pre-training of Modular Prompt for Chinese Few-Shot Learning. ACL 2023

Tianxiang Sun, Zhengfu He, Qin Zhu, Xipeng Qiu, Xuanjing Huang. [pdf]
When Gradient Descent Meets Derivative-Free Optimization: A Match Made in Black-Box Scenario. ACL 2023

Chengcheng Han, Liqing Cui, Renyu Zhu, Jianing Wang, Nuo Chen, Qiushi Sun, Xiang Li, Ming Gao. [pdf]
Make Prompt-based Black-Box Tuning Colorful: Boosting Model Generalization from Three Orthogonal Perspectives. LREC-COLING 2024

Qiushi Sun, Chengcheng Han, Nuo Chen, Renyu Zhu, Jingyang Gong, Xiang Li, Ming Gao. [pdf]

Feature-based Learning

To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks. RepL4NLP@ACL 2019

Matthew E. Peters, Sebastian Ruder, Noah A. Smith. [pdf]
Can Explanations Be Useful for Calibrating Black Box Models? ACL 2022

Xi Ye, Greg Durrett. [pdf] [code]
Co-training Improves Prompt-based Learning for Large Language Models. ICML 2022

Hunter Lang, Monica Agrawal, Yoon Kim, David Sontag. [pdf]
Y-Tuning: An Efficient Tuning Paradigm for Large-Scale Pre-Trained Models via Label Representation Learning. Preprint 2022.2

Yitao Liu, Chenxin An, Xipeng Qiu. [pdf]
LST: Ladder Side-Tuning for Parameter and Memory Efficient Transfer Learning. NeurIPS 2022

Yi-Lin Sung, Jaemin Cho, Mohit Bansal. [pdf] [code]
Decoder Tuning: Efficient Language Understanding as Decoding. Preprint 2022.12

Ganqu Cui, Wentao Li, Ning Ding, Longtao Huang, Zhiyuan Liu, Maosong Sun. [pdf]

Data Generation

Generating Datasets with Pretrained Language Models. EMNLP 2021

Timo Schick, Hinrich Schütze. [pdf] [code]
GPT3Mix: Leveraging Large-scale Language Models for Text Augmentation. Findings of EMNLP 2021

Kang Min Yoo, Dongju Park, Jaewook Kang, Sang-Woo Lee, Woomyeong Park. [pdf] [code]
Generated Knowledge Prompting for Commonsense Reasoning. ACL 2022

Jiacheng Liu, Alisa Liu, Ximing Lu, Sean Welleck, Peter West, Ronan Le Bras, Yejin Choi, Hannaneh Hajishirzi. [pdf] [code]
ZeroGen: Efficient Zero-shot Learning via Dataset Generation. EMNLP 2022

Jiacheng Ye, Jiahui Gao, Qintong Li, Hang Xu, Jiangtao Feng, Zhiyong Wu, Tao Yu, Lingpeng Kong. [pdf] [code]
ZeroGen+: Self-Guided High-Quality Data Generation in Efficient Zero-Shot Learning. Preprint 2022.2

Jiahui Gao, Renjie Pi, Yong Lin, Hang Xu, Jiacheng Ye, Zhiyong Wu, Xiaodan Liang, Zhenguo Li, Lingpeng Kong. [pdf]
AugESC: Dialogue Augmentation with Large Language Models for Emotional Support Conversation. Findings of ACL 2023

Chujie Zheng, Sahand Sabour, Jiaxin Wen, Zheng Zhang, Minlie Huang. [pdf]
Generating Training Data with Language Models: Towards Zero-Shot Language Understanding. NeurIPS 2022

Yu Meng, Jiaxin Huang, Yu Zhang, Jiawei Han [pdf] [code]

Contributing

:+1::tada: First off, thanks for taking the time to contribute! :tada::+1:

Steps to contribute:

Add a new paper or update an existing paper. Please check if your added paper fits into the scope of this repo.
Please use the same format as existing entries. When adding keywords tags, please follow the keywords convention. When adding the pdf link of the paper, please use the abstract page if it is on arXiv.
Modify the PaperNumber on the top of the page accordingly and submit your pull request. We recommend giving a very brief explanation why you think a paper should be added or changed.

Don't worry if you put something wrong, we will fix them for you. Just contribute and promote your awesome work here!

Contributors

In addition to the following contributors who submitted pull requests, we would also like to thank Ohad Rubin and Kang Min Yoo for recommending papers.