Home

Awesome

NER that uses Graph Conv Nets

This is an implementation of a named entity recognizer that uses Graph Convolutional Networks. The reference article is Graph Convolutional Networks for Named Entity Recognition.

This code uses GCNs and POS tagging to boost the entity recognition of a bidirectional LSTM. It scores ~81% on the Ontonotes 5 test dataset, which can be retrieved from the LDC website.

The system currently uses the word vectors that come with spacy's "en_core_web_md" model.

Installation

git clone https://github.com/contextscout/gcn_ner.git

cd gcn_ner

virtualenv --python=/usr/bin/python3 .env

source .env/bin/activate

pip install -r requirements.txt

python -m spacy download en

python -m spacy download en_core_web_md

if you want to install Tensorflow with GPU capabilities please use

pip install -r requirements_gpu.txt

Test NER on a text

Execute the file

python test_ner.py < data/random_text.txt

Train NER from a dataset

You will need to put your 'train.conll' into the 'data/' directory, then execute the file

python train.py

Test the dataset F1 score

You will need to put your 'dev.conll' or 'test.conll' into the 'data/' directory, then execute the file

python test_dataset.py

CONLL format

The training/testing conll files must be in the conll format, as in the following example. Only the fourth, fifth, and eleventh columns are used.

source_file_name   1    0              New   NNP    (TOP(S(NP*         -    -   -   Speaker#1    (GPE*      *       (ARG1*   (ARG1*   (19
source_file_name   1    1             York   NNP             *)        -    -   -   Speaker#1        *)     *            *)       *)   19)
source_file_name   1    2              was   VBD          (VP*         be  03   -   Speaker#1        *    (V*)           *        *     -
source_file_name   1    3        developed   VBN          (VP*    develop  02   -   Speaker#1        *      *          (V*)       *     -
source_file_name   1    4             from    IN          (PP*         -    -   -   Speaker#1        *      *       (ARG2*        *     -
source_file_name   1    5                a    DT          (NP*         -    -   -   Speaker#1        *      *            *        *     -
source_file_name   1    6          hunting    NN             *         -    -   -   Speaker#1        *      *            *        *     -
source_file_name   1    7           harbor    NN            *))        -    -   -   Speaker#1        *      *            *)       *     -
source_file_name   1    8              one    CD  (ADVP(NP(QP*         -    -   -   Speaker#1   (DATE*      *   (ARGM-TMP*        *     -
source_file_name   1    9          million    CD             *)        -    -   -   Speaker#1        *      *            *        *     -
source_file_name   1   10            years   NNS             *)        -    -   -   Speaker#1        *      *            *        *     -
source_file_name   1   11              ago    RB             *)        -    -   -   Speaker#1        *)     *            *)       *     -
source_file_name   1   12               to    TO        (S(VP*         -    -   -   Speaker#1        *      *   (ARGM-PRP*        *     -
source_file_name   1   13           become    VB          (VP*     become  01   1   Speaker#1        *      *            *      (V*)    -
source_file_name   1   14            today    NN       (NP(NP*         -    -   -   Speaker#1    (DATE)     *            *   (ARG2*     -
source_file_name   1   15               's   POS             *)        -    -   -   Speaker#1        *      *            *        *     -
source_file_name   1   16    international    JJ             *         -    -   -   Speaker#1        *      *            *        *     -
source_file_name   1   17       metropolis   NNS        *))))))        -    -   -   Speaker#1        *      *            *)       *)    -