Home

Awesome

BERT and SpanBERT for Coreference Resolution

This repository contains code and models for the paper, BERT for Coreference Resolution: Baselines and Analysis. Additionally, we also include the coreference resolution model from the paper SpanBERT: Improving Pre-training by Representing and Predicting Spans, which is the current state of the art on OntoNotes (79.6 F1). Please refer to the SpanBERT repository for other tasks.

The model architecture itself is an extension of the e2e-coref model.

Setup

Pretrained Coreference Models

Please download following files to use the pretrained coreference models on your data. If you want to train your own coreference model, you can skip this step.

Model<model_name> for downloadF1 (%)
BERT-basebert_base73.9
SpanBERT-basespanbert_base77.7
BERT-largebert_large76.9
SpanBERT-largespanbert_large79.6

./download_pretrained.sh <model_name> (e.g,: bert_base, bert_large, spanbert_base, spanbert_large; assumes that $data_dir is set) This downloads BERT/SpanBERT models finetuned on OntoNotes. The original/non-finetuned version of SpanBERT weights is available in this repository. You can use these models with evaluate.py and predict.py (the section on Batched Prediction Instructions)

Training / Finetuning Instructions

Setup for training

This assumes access to OntoNotes 5.0. ./setup_training.sh <ontonotes/path/ontonotes-release-5.0> $data_dir. This preprocesses the OntoNotes corpus, and downloads the original (not finetuned on OntoNotes) BERT models which will be finetuned using train.py.

Batched Prediction Instructions

{
  "clusters": [], # leave this blank
  "doc_key": "nw", # key closest to your domain. "nw" is newswire. See the OntoNotes documentation.
  "sentences": [["[CLS]", "subword1", "##subword1", ".", "[SEP]"]], # list of BERT tokenized segments. Each segment should be less than the max_segment_len in your config
  "speakers": [["[SPL]", "-", "-", "-", "[SPL]"]], # speaker information for each subword in sentences
  "sentence_map": [0, 0, 0, 0, 0], # flat list where each element is the sentence index of the subwords
  "subtoken_map": [0, 0, 0, 1, 1]  # flat list containing original word index for each subword. [CLS]  and the first word share the same index
}

Notes

Important Config Keys

Slurm

If you have access to a slurm GPU cluster, you could use the following for set of commands for training.

Miscellaneous

Citations

If you use the pretrained BERT-based coreference model (or this implementation), please cite the paper, BERT for Coreference Resolution: Baselines and Analysis.

@inproceedings{joshi2019coref,
    title={{BERT} for Coreference Resolution: Baselines and Analysis},
    author={Mandar Joshi and Omer Levy and Daniel S. Weld and Luke Zettlemoyer},
    year={2019},
    booktitle={Empirical Methods in Natural Language Processing (EMNLP)}
}

Additionally, if you use the pretrained SpanBERT coreference model, please cite the paper, SpanBERT: Improving Pre-training by Representing and Predicting Spans.

@article{joshi2019spanbert,
    title={{SpanBERT}: Improving Pre-training by Representing and Predicting Spans},
    author={Mandar Joshi and Danqi Chen and Yinhan Liu and Daniel S. Weld and Luke Zettlemoyer and Omer Levy},
    year={2019},
    journal={arXiv preprint arXiv:1907.10529}
}