Home

Awesome

Can BERT Refrain from Forgetting on Sequential Tasks? A Probing Study

Open-resource code of our ICLR 2023 paper: Can BERT Refrain from Forgetting on Sequential Tasks? A Probing Study https://openreview.net/forum?id=UazgYBMS9-W


Requirements

python==3.8 pytorch>=1.7.0 transformers>=4.3.0


Download Datasets

We follow d'Autume et al. (2019), and download the datasets from their Google Drive.


Prepare

  1. Clone from Github.
  2. Create the data directory:
cd plms_are_lifelong_learners
mkdir data
  1. Move the .tar.gz files to "data".
  2. Uncompress the .tar.gz files and sample data from the original datasets:
bash uncompressing.sh
python sampling_data.py --seed 42

Train Models Sequentially

  1. Tokenize the input texts:
python tokenizing.py --tokenizer bert-base-uncased --data_dir ./data/ --max_token_num 128

"bert-base-uncased" can be replaced by any other tokenizer in Hugging Face Transformer Models, e.g. "roberta-base", "prajjwal1/bert-tiny", etc. The files with tokenized texts will be saved in the directory ./data/

  1. Train the models
CUDA_VISIBLE_DEVICES=0 python train_cla.py --plm_name bert-base-uncased --tok_name bert-base-uncased --pad_token 0 --plm_type bert --hidden_size 768 --device cuda --seed 1023 --padding_len 128 --batch_size 32 --learning_rate 1.5e-5 --trainer sequential --epoch 2 --order 0 --rep_itv 10000 --rep_num 100

Probing Study

  1. Re-train the decoder in each checkpoint:
CUDA_VISIBLE_DEVICES=0 python probing_train.py --plm_name bert-base-uncased --tok_name bert-base-uncased --pad_token 0 --plm_type bert --hidden_size 768 --device cuda --seed 1023 --padding_len 128 --batch_size 32 --learning_rate 3e-5 --epoch 10 --train_time "1971-02-03-14-56-07"
  1. Evaluate the performance of each re-trained model:
CUDA_VISIBLE_DEVICES=0 python test_model.py --plm_name bert-base-uncased --tok_name bert-base-uncased --pad_token 0 --plm_type bert --hidden_size 768 --device cuda --padding_len 128 --train_time "1971-02-03-14-56-07"