Home

Awesome

Chinese clinical named entity recognition (CNER) using pre-trained BERT model

Introduction

Code for paper Chinese clinical named entity recognition with variant neural structures based on BERT methods

Paper url: https://www.sciencedirect.com/science/article/pii/S1532046420300502

We pre-trained BERT model to improve the performance of Chinese CNER. Different layers such as Long Short-Term Memory (LSTM) and Conditional Random Field (CRF) were used to extract the text features and decode the predicted tags respectively. And we also proposed a new strategy to incorporate dictionary features into the model. Radical features of Chinese characters were also used to improve the model performance.

Model structure

Model Structure

Usage

Pre-trained models

For replication, we uploaded two models in Baidu Netdisk.

Link: https://pan.baidu.com/s/1obzG6OSbu77duhusWg2xmQ Code: k53q

Examples

To replicate the result of CCKS-2018 dataset

python main.py \
--data_dir=data/ccks_2018 \
--bert_model=model/  \
--output_dir=./output  \
--terminology_dicts_path="{'medicine':'data/ccks_2018/drug_dict.txt','surgery':'data/ccks_2018/surgery_dict.txt'}" \
--radical_dict_path data/radical_dict.txt \
--constant=0 \
--add_radical_or_not=True \
--radical_one_hot=False \
--radical_emb_dim=20 \
--max_seq_length=480 \
--do_train=True \
--do_eval=True \
--train_batch_size=6 \
--eval_batch_size=4 \
--hidden_dim=64 \
--learning_rate=5e-5 \
--num_train_epochs=5 \
--gpu_id=3 \

Results

CCKS-2018 dataset

MethodPRF1
FT-BERT+BiLSTM+CRF88.5789.0288.80
+dictionary88.5889.1788.87
+radical(one-hot encoding)88.5189.3988.95
+radical(random embedding)89.2489.1189.17
+dictionary +radical89.4289.2289.32
ensemble89.5989.5489.56
Team NameMethodF1
Yang and Huang (2018)CRF(feature-rich + rule)89.26
heiheihaheiLSTM-CRF(ensemble)88.92
Luo et al.(2018)LSTM-CRF(ensemble)88.63
dous12-88.37
chengachengcheng-88.30
NUBT-IBDL-87.62
OurFT-BERT+BiLSTM +CRF+Dictionary(ensemble)89.56

CCKS-2017 dataset

MethodPRF1
FT-BERT+BiLSTM+CRF91.6490.9891.31
+dictionary91.4990.9791.23
+radical(one-hot encoding)91.8390.8091.35
+radical(random embedding)92.0790.7791.42
+dictionary+radical91.7690.8891.32
ensemble92.0691.1591.60
Team NameMethodF1
Qiu et al. (2018b)RD-CNN-CRF91.32
Wang et al. (2019)BiLSTM-CRF+Dictionary91.24
Hu et al. (2017)BiLSTM-FEA(ensemble)91.03
Zhang et al. (2018)BiLSTM-CRF(mt+att+ms)90.52
Xia and Wang (2017)BiLSTM-CRF(ensemble)89.88
Ouyang et al. (2017)BiRNN-CRF88.85
Li et al. (2017)BiLSTM-CRF(specialized +lexicons)87.95
OurFT-BERT+BiLSTM +CRF+Dictionary(ensemble)91.60