Home

Awesome

[ACL 2024] A Codebase for Incremental Learning with Large Language Models

ACL 2024 ACL 2024 (Findings) arXiv

Contents

Introduction

This is a repository for Incremental Learning with Large Language Models.

<!-- Please refer to [this repository](https://github.com/zzz47zzz/pretrained-lm-for-incremental-learning-survey) for survey, resource, and paper in this area. -->

Supported List

Scenario

Tasks

Methods

More baselines will be released in the future!

General (Text/Intent) Classification

<!-- - [ ] [ConTinTin (ACL 2022)](https://aclanthology.org/2022.acl-long.218/) --> <!-- - [ ] [AdapComModules (ACL 2022, code is available but hard to implement)](https://aclanthology.org/2022.acl-long.255/) --> <!-- - [ ] [LotteryPrompt (ACL 2023, code not available for now)](https://aclanthology.org/2023.acl-long.16/) --> <!-- #### Relation Extraction - [ ] [SCKD (ACL 2023 findings)](https://arxiv.org/pdf/2305.06616.pdf) - [ ] [FEA (arxiv 2022)](https://arxiv.org/abs/2209.00243) - [ ] [CRECL (COLING 2022)](https://aclanthology.org/2022.coling-1.163/) - [ ] [KIP-Frame (TASLP 2022)](https://ieeexplore.ieee.org/abstract/document/9860068/) - [ ] [ERDA (ACL 2022)](https://aclanthology.org/2022.acl-long.198/) - [ ] [CRL (ACL 2022 findings)](https://aclanthology.org/2022.findings-acl.268/) - [ ] [RPCRE (ACL 2021)](https://aclanthology.org/2021.acl-long.20/) - [ ] [CML (AAAI 2021)](https://ojs.aaai.org/index.php/AAAI/article/view/17241) - [ ] [IDLVQ-C (ICLR 2021)](https://openreview.net/forum?id=3SV-ZePhnZM) - [ ] [EMAR (ACL 2020)](https://aclanthology.org/2020.acl-main.573/) -->

Named Entity Recognition

Original for Image Classification

<!-- - [ ] [A-GEM (ICLR 2019)](https://arxiv.org/abs/1812.00420) - [ ] [GEM (NIPS 2017)](https://proceedings.neurips.cc/paper/2017/hash/f87522788a2be2d171666752f97ddebb-Abstract.html) -->

Datasets

Instance Incremental Learning

Intent Classification

Intent Classification

Relation Extraction

Named Entity Recognition

Best Practice to Use this Codebase

How to reproduce the performance of SEQ and SEQ*?

The config file of SEQ (just sequential fine-tuning) can be found in the SEQ_full.yaml (in the config directory). The config file of SEQ* can be found in the SEQ_pre_warm_fix.yaml. Note that the classifier type (linear or cosine linear) is not specified in all config files because we set it the script. An example can be found in https://github.com/zzz47zzz/codebase-for-incremental-learning-with-llm/blob/main/reproduce_shell/exp-CIL-sota/SOTA-CIL-Intent-discriminative-banking77_task7.sh.

Usage

Overview

.
├── main_CL.py              # This this the python file to be executed for running all experiments
├── utils                       # This folder contains all basic files for incremental learning 
│   ├── backbone.py             # This file loads backbone models from the transformers library
│   ├── buffer.py               # This file defines the replay buffer
│   ├── classifier.py           # This file loads Linear/CosineLinear classifiers
│   ├── wrapmodel.py            # This file wrap the model for using DeepSpeed with accelerate
│   ├── dataformat_preprocess.py# This file preprocess the raw datasets to the continual learning dataset
│   ├── dataloader.py           # This file prepare the input for languge models
│   ├── dataset.py              # This file defines the format for different datasets for continual learning
│   ├── download_backbones.py   # This file downloads models in advance to avoid network problem.
│   ├── evaluation.py           # This file defines the evaluation process for various tasks
│   ├── factory.py              # This file loads the various models from the ./models folder
│   ├── logger.py               # This file defines the logger
│   ├── metric.py               # This file defines the evaluation metric for continual learning
│   ├── optimizer.py            # This file defines the optimizer for different models
│   ├── prompt.py               # This file defines the prompt used for different tasks
│   ├── probing.py              # This file computes the probing performance
│   └── config.py               # This file defines general parameters and settings for the experiments
├── config                  # This folder contains the hyper-parameters for each methods in each datasets
├── dataset                 # This folder contains datasets for continual learning
├── models                  # This folder contains models for continual learning
└── experiments             # This folder contains log data for each run                 

Quick Start

Step 1: prepare the environment

pip install -r requirement.txt

Step 2: prepare the dataset

Check the support_dataset_list in utils/dataformat_preprocess.py and select the dataset you want for experiment.

Then, download the raw dataset to the folder dataset/{dataset-name}. For example, download the clinc150 to the folder dataset/clinc150. The raw datasets can be downloaded here. We note that the raw data of Conept-1K is in dataset/concept_1k. The preprocessed Concept-1K for 10 step incremental learning is in dataset/concept_1k_task10. The whole Concept-1K is in dataset/concept_1k_task1.

Next, exceute the preprocess_dataset.sh. It will automatically preprocess 8 default datasets for reproducing results ('topic3datasets','clinc150','banking77', 'fewrel','tacred','conll2003','fewnerd','i2b2','ontonotes5') and create new folders in datasets/{dataset-for-continual-learning-name} automatically (e.g.,backing_task7). If you do not need to customize the datasets, you can skip to Step 3.

To customize the datasets, you can run utils/dataformat_preprocess.py with your own parameters (e.g., random seeds, num of tasks). This process will create a new target folder dataset/{dataset-for-continual-learning-name}. In the target folder, two json files continual_data.json and continual_config.json will be saved. For example, you can prepare clinc150 and fewrel dataset by runing

python utils/dataformat_preprocess.py --dataset clinc150 --seed 1

and

python utils/dataformat_preprocess.py --dataset fewrel --seed 1

The program will create target folders dataset/clinc150_task15 and dataset/fewrel_task8.

For NER datasets, for example ontonotes5, you can run the following command

python utils/dataformat_preprocess.py --dataset ontonotes5 --seed 1 --base_task_entity 8 --incremental_task_entity 2 --seen_all_labels False

The program will create a target folder dataset/ontonotes5_task6_base8_inc2. We note that fixing the random seed enables that exctaly the same datasets can be generated on different devices. Finally, the post-precessed dataset clinc150_task15,fewrel_task8, and ontonotes5_task6_base8_inc2 are ready for continual learning!

Step 3: select the yaml file for hyper-parameters

The yaml file contains the hyper-parameters for each method. For example, the hyper-parameter of SEQ* (w/ and w/o pre-allocating future classifiers) for generative backbones under CIL settings is defined in config/CIL/generative_backbones/clinc150_task15/SEQ_pre_warm_fix.yaml and config/CIL/generative_backbones/clinc150_task15/SEQ_warm_fix.yaml respectively.

Step 4: reproduce the results

The scripts for reproducing the probing study are in the folder reproduce_shell/exp-probing.

The scripts for reproducing the probing study with different pre-training steps are in the folder reproduce_shell/exp-probing-pretraining.

The scripts for reproducing the experiments of comparing SEQ* with SOTA methods are in the folder reproduce_shell/exp-sota.

If you want to run an experiment, execute the main_CL.py. For example, you can run SEQ method on clinc150_task15 dataset with bert-base-cased using the following command:

python main_CL.py --exp_prefix {your-experiment-name} --cfg './config/clinc150_task15/SEQ_full.yaml' --backbone bert-base-cased --classifier Linear --training_epochs 5

If you want to use wandb for logging (see here for more help):

python main_CL.py --is_wandb True --wandb_project {your-project-name} --wandb_entity {your-entity-name} --exp_prefix {your-experiment-name} --cfg './config/clinc150_task15/SEQ_full.yaml' --backbone bert-base-cased --classifier Linear --training_epochs 5 

If you want to use accelerate for data/model parallel (see here for more help):

accelerate launch --config_file {your-accelerate-config-file} main_CL.py --is_wandb True --wandb_project {your-project-name} --wandb_entity {your-entity-name} --exp_prefix {your-experiment-name} --cfg './config/clinc150_task15/SEQ_full.yaml' --backbone bert-base-cased --classifier Linear --training_epochs 5 

Please refer to utils/config.py for more general paramters and models/{model-name}.py for more model-specific parameters.

Main Results

The results on IIL scenario. main_results

The results on CIL and TIL scenario. main_results

main_results

Questions and Citation

If you have questions about this repository, please feel free to contact me at junhaozheng47@outlook.com.

If you find this repository useful, please consider citing our paper.

@misc{zheng2023learn,
      title={Learn or Recall? Revisiting Incremental Learning with Pre-trained Language Models}, 
      author={Junhao Zheng and Shengjie Qiu and Qianli Ma},
      year={2023},
      eprint={2312.07887},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
@article{qiu2024incremental,
  title={Incremental Sequence Labeling: A Tale of Two Shifts},
  author={Qiu, Shengjie and Zheng, Junhao and Liu, Zhen and Luo, Yicheng and Ma, Qianli},
  journal={arXiv preprint arXiv:2402.10447},
  year={2024}
}
@misc{zheng2024concept1k,
      title={Concept-1K: A Novel Benchmark for Instance Incremental Learning}, 
      author={Junhao Zheng and Shengjie Qiu and Qianli Ma},
      year={2024},
      eprint={2402.08526},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}
<!-- ### Star History [![Star History Chart](https://api.star-history.com/svg?repos=zzz47zzz/pretrained-lm-for-incremental-learning&type=Timeline)](https://star-history.com/#zzz47zzz/pretrained-lm-for-incremental-learning&Timeline) -->