

Continual Training of Language Models for Few-Shot Learning

This repository contains the code and pre-trained models for our EMNLP'22 paper Continual Training of Language Models for Few-Shot Learning by <a href="https://vincent950129.github.io/"> Zixuan Ke</a>, <a href="https://linhaowei1.github.io/">Haowei Lin</a>, <a href="https://shaoyijia.github.io/">Yijia Shao</a>, <a href="https://howardhsu.github.io/">Hu Xu</a>, <a href="https://leishu02.github.io/">Lei Shu</a>, and <a href="https://www.cs.uic.edu/~liub/">Bing Liu</a>.

Quick Links


We propose the problem of continually extending an LM by incrementally post-train the LM with a sequence of unlabeled domain corpora to expand its knowledge without forgetting its previous skills. Under the goal of improving few-shot end-task learning in these domains, we propose a system called CPT (Continual Post-Training), which to our knowledge, is the first continual post-training system. Experimental results verify its effectiveness. And the following figure is an illustration of our model.


First, install PyTorch by following the instructions from the official website. To faithfully reproduce our results, please use the correct 1.5.1 version corresponding to your platforms/CUDA versions. PyTorch version higher than 1.5.1 should also work. For example, if you use Linux and CUDA9.2 (how to check CUDA version), install PyTorch by the following command,

pip install torch==1.5.1+cu92 -f https://download.pytorch.org/whl/torch_stable.html

If you instead use CUDA >10.2 or CPU, install PyTorch by the following command,

pip install torch==1.5.1

Then run the following script to install the remaining dependencies,

pip install -r requirements.txt

Attention: Our model is based on transformers==4.11.3 and adapter-transformers==2.2.0. Using them from other versions may cause some unexpected bugs.

Use CPT with Huggingface

You can easily import our continually post-trained model with HuggingFace's transformers:

import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification

# Import our model. The package will take care of downloading the models automatically
tokenizer = AutoTokenizer.from_pretrained("roberta-base")
model = AutoModelForSequenceClassification.from_pretrained("UIC-Liu-Lab/CPT", trust_remote_code=True)

# Tokenize input texts
texts = [
    "There's a kid on a skateboard.",
    "A kid is skateboarding.",
    "A kid is inside the house."
inputs = tokenizer(texts, padding=True, truncation=True, return_tensors="pt")

# Task id and smax
t = torch.LongTensor([0]).to(model.device)	# using task 0's CL-plugin, choose from {0, 1, 2, 3}
smax = 400

# Get the model output!
res = model(**inputs, return_dict=True, t=t, s=smax)

If you encounter any problem when directly loading the models by HuggingFace's API, you can also download the models manually from the repo and use model = AutoModel.from_pretrained({PATH TO THE DOWNLOAD MODEL}).

Note: The post-trained weights you load contain un-trained classification heads. The post-training sequence is Restaurant -> AI -> ACL -> AGNews, you can use the downloaded weights to fine-tune the corresponding end-task. The results (MF1/Acc) will be consistent with follows.

UIC-Liu-Lab/CPT53.90 / 75.1330.42 / 30.8937.56 / 38.5363.77 / 65.7946.41 / 52.59

Train CPT

In the following section, we describe how to train a CPT model by using our code.


Before training and evaluation, please download the dataset from this Google Drive link and save them in the ./data directory.


Training scripts

We provide an example training script to run CPT. We explain the arguments in the following:

All the other arguments are standard Huggingface's transformers training arguments. Some of the often-used arguments are: --max_seq_length, --learning_rate, --per_device_train_batch_size. In our example scripts, we also set to train and evaluate the model on the cpt_datasets_pt and cpt_datasets_ft sequence files. See ./sequence for details.

For the results in the paper, we use Nvidia GeForce RTX2080 GPUs with CUDA 10. Using different types of devices or different versions of CUDA/other software may lead to slightly different performance.


We use the following hyperparameters for training CPT:

CPT post-trainingCPT fine-tuning
Batch size4820
Learning rate1e-45e-5

End-Task Fine-tuning

Once you finished post-train, come back to the root directory and simply run

CUDA_VISIBLE_DEVICES=${your_cuda_device_id} bash scripts/finetune_cpt_unfreeze_parallel.sh

Our codebase offers convenient tools for collecting experimental results and automatic scripts for continual learning. After the right execution, you are expected to get the results in the following format:

└── seq0
    ├── seed111
    │   └── cpt_parallel_unfreeze
    │       └── pt
    │           ├── acl_unsup_roberta
    │           │   ├── 111.model
    │           │   ├── config.json
    │           │   ├── mask_back
    │           │   ├── mask_pre
    │           │   ├── merges.txt
    │           │   ├── pt_log
    │           │   │   └── events.out.tfevents.1665472004.lthpc.1718401.0
    │           │   ├── pytorch_model.bin
    │           │   ├── special_tokens_map.json
    │           │   ├── tokenizer_config.json
    │           │   └── vocab.json
    │           ├── agnews_unsup_roberta
    │           │   └──  ...
    │           ├── ai_unsup_roberta
    │           │   └──  ...
    │           ├── few_shot_acc_111
    │           ├── few_shot_f1_111
    │           ├── few_shot_forward_acc_111
    │           ├── few_shot_forward_f1_111
    │           ├── few_shot_progressive_acc_111
    │           ├── few_shot_progressive_f1_111
    │           ├── restaurant_unsup_roberta
    │           └── └──  ...
    ├── seed2021
    │   └── cpt_parallel_unfreeze
    │       └── pt
    │           ├── few_shot_acc_2021
    │           ├── few_shot_f1_2021
    │           ├── few_shot_forward_acc_2021
    │           ├── few_shot_forward_f1_2021
    │           ├── few_shot_progressive_acc_2021
    │           └── few_shot_progressive_f1_2021
    └── seed222
        └── cpt_parallel_unfreeze
            └── pt
                ├── few_shot_acc_222
                ├── few_shot_f1_222
                ├── few_shot_forward_acc_222
                ├── few_shot_forward_f1_222
                ├── few_shot_progressive_acc_222
                └──  few_shot_progressive_f1_222

Arguments for the end-task fine-tuning script are as follows,

Bugs or questions?

If you have any questions related to the code or the paper, feel free to email Zixuan, Haowei, and Yijia. If you encounter any problems when using the code, or want to report a bug, you can open an issue. Please try to specify the problem with details so we can help you better and quicker!


Please cite our paper if you use CPT in your work:

   title={Continual Training of Language Models for Few-Shot Learning},
   author={Ke, Zixuan and Lin, Haowei and Shao, Yijia and Xu, Hu and Shu, Lei, and Liu, Bing},
   booktitle={Empirical Methods in Natural Language Processing (EMNLP)},