Home

Awesome

Sequence-to-Nuggets: Nested Entity Mention Detection via Anchor-Region Networks

This is the source code for paper "Sequence-to-Nuggets: Nested Entity Mention Detection via Anchor-Region Networks" in ACL2019.

Requirements

Usage

First, please unzip the datasets in "data/"

Then enter "src/workspace_*" dir, run the program like

Hyperparameters in our paper are saved in "main_seq2nugget.py" files.

Additional Notes on KBP2017 Evaluation

KBP2017 contains forum data and the original annoatations marked the author fields of each post as either "PER" or "ORG". This can result in an undervalued recall rate if we did not take this into consideration because the data used in our model removed all author fields.

Following all previous works, we annotate all post author field in original data as "PER" in "kbp2017_evaluator/kbp2017_test_eval_files/authors.dat". Then for correct evaluation, we need to combine the "authors.dat" with the system output, and then use the official evalution toolkit to obtain the accurate evaluation result.

For this, we should first install the offical evaluation toolkit with command

And then transform our system output into the required format (also combine it with the author fields) by entering "kbp2017_evaluator" dir and then use the command:

Finally, we obtain the offical evalution result with command

The "test_result_file" can be found in the Model dir in workspace. We also provide a sample test_result in "kbp2017_test_eval_files/sample_sys_test_output.dat" which is produced by the model reported in our paper.

Citation

Please cite:

@InProceedings{lin-Etal:2019:ACL2019sequence,
  author    = {Lin, Hongyu and Lu, Yaojie and Han, Xianpei and Sun, Le},
  title     = {Sequence-to-Nuggets: Nested Entity Mention Detection via Anchor-Region Networks},
  booktitle = {Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics},
  month     = {July},
  year      = {2019},
  publisher = {Association for Computational Linguistics}
}

Contact

If you have any question or want to request for the used datasets (ACE2005, GENIA and KBP2017 if you have the license), please contact me by