Awesome
LV-BERT
Introduction
In this repo, we introduce LV-BERT by exploiting layer variety for BERT. For detailed description and experimental results, please refer to our paper LV-BERT: Exploiting Layer Variety for BERT (Findings of ACL 2021).
Requirements
- Python 3.6
- TensorFlow 1.15
- numpy
- scikit-learn
Experiments
Firstly, set your data dir (absolute) to place datasets and models by
DATA_DIR=/path/to/data/dir
Fine-tining
We give the instruction to fine-tune a pre-trained LV-BERT-small (13M parameters) on GLUE. You can refer to this Google Colab notebook for a quick example. All models are provided in Google Drive, Huggingface and GitHub release. The models are pre-trained 1M steps with sequence length 128 to save compute. *_seq512
named models are trained for more 100K steps with sequence length 512 whichs are used for long-sequence tasks like SQuAD. See our paper for more details on model performance.
- Create your data directory.
mkdir -p $DATA_DIR/models && cp vocab.txt $DATA_DIR/
Put the pre-trained model in the corresponding directory
mv lv-bert_small $DATA_DIR/models/
- Download the GLUE data by running
python3 download_glue_data.py
- Set up the data by running
cd glue_data && mv CoLA cola && mv MNLI mnli && mv MRPC mrpc && mv QNLI qnli && mv QQP qqp && mv RTE rte && mv SST-2 sst && mv STS-B sts && mv diagnostic/diagnostic.tsv mnli && mkdir -p $DATA_DIR/finetuning_data && mv * $DATA_DIR/finetuning_data && cd ..
- Fine-tune the model by running
bash finetune.sh $DATA_DIR
PS: (a) You can test different tasks by changing configs in finetune.sh
. (b) Some of the datasets on GLUE are small, causing that the results may vary substantially for different random seeds. The same as ELECTRA, we report the median of 10 fine-tuning runs from the same pre-trained model for each result.
Pre-training
We give the instruction to pre-train LV-BERT-small (13M parameters) using the OpenWebText corpus.
-
First download the OpenWebText pre-traing corpus (12G).
-
After downloading the pre-training corpus, build the pre-training dataset tf-record by running
bash build_data.sh $DATA_DIR
- Then, pre-train the model by running
bash pretrain.sh $DATA_DIR
Bibtex
@inproceedings{yu2021lv-bert,
author = {Yu, Weihao and Jiang, Zihang and Chen, Fei, Hou, Qibin and Feng, Jiashi},
title = {LV-BERT: Exploiting Layer Variety for BERT},
booktitle = {Findings of ACL},
month = {August},
year = {2021}
}
Reference
This repo is based on the repo ELECTRA.
Acknowledgment
Weihao Yu would like to thank TPU Research Cloud (TRC) program for the support of partial computational resources.