Awesome
BioKMNER
This is the implementation of Improving Biomedical Named Entity Recognition with Syntactic Information at BMC Bioinformatics.
Please contact us at yhtian@uw.edu
if you have any questions.
Citation
If you use or extend our work, please cite our paper at ACL2020.
@article{tian2020improving,
title={Improving Biomedical Named Entity Recognition with Syntactic Information},
author={Tian, Yuanhe and Shen, Wang and Song, Yan and Xia, Fei and He, Min and Li, Kenli},
year={2020}
jurnal={BMC Bioinformatics}
volume={21}
page={539}
}
Environment
The code works with the following environment:
python=3.6
pytorch=1.1
Data
Following BioBERT, the data used in our paper can be found at here (or here). You can see our sample data for reference.
To obtain the syntactic information, please follow the following steps:
- Download Stanford CoreNLP Toolkits (v3.9.2) and put the folder
stanford-corenlp-full-2018-10-05
under the current directory. - Run
python data_helper.py --dataset=/path/to/the/dataset/
to preprocess the data.
Run on Sample Data
To run our code, you first need to set the environment up and download biobert and put it into biobert_pyt directory (please use our config.json file).
If the model is tf version, you need to convert it to pytorch version.
Also, you need to replace original the config.json in your model directory with the config.json in the bert model directory provided by us.
You can run run.sh directly to train and evaluate our model on the sample data.
To-do list
- Release our pre-trained models.
- Regular maintenance.
We will keep updating this repository recently.