Home

Awesome

CMU multinlp

This repository contains scripts to download and preprocess the General Language Analysis Datasets (GLAD) benchmark and codes for the task-agnostic SpanRel model, a multi-faceted NLP toolkit that can cover many different tasks.

Generalizing Natural Language Analysis through Span-relation Representations (ACL2020)

Prerequisites

# install jsonnet from https://github.com/google/jsonnet
conda create -n spanrel python=3.6
conda activate spanrel
./setup.sh

Datasets

8 datasets consisting of annotations of 10 tasks are included in this repository.

DatasetTaskTask codeDir
Wet Lab ProtocolsNERwlpdata/wlp
REwlpdata/wlp
CoNLL-2003NERnerdata/semeval_2014/
SemEval-2010 Task 8RErcdata/semeval_2010_task8/
OntoNotes 5.0Coref.corefdata/conll_coref_2012/
SRLsrldata/conll_srl_2012/
POSpos_conlldata/conll_pos_2012/
Dep.dp_conlldata/conll_dep_2012/
Consti.consti_conlldata/conll_consti_2012/
Penn TreebankPOSposdata/ptb_pos/
Dep.dpdata/ptb/
Consti.constidata/ptb_consti/
OIE2016OpenIEoiedata/openie/
MPQA 3.0ORLorldata/mpqa/
SemEval-2014 Task 4ABSAsemeval14_st2data/semeval_2014/

Follow the instructions in run.sh in each dataset directory to download and preprocess the datasets into BRAT format.

Train and Evaluate SpanRel models

Run BERT-based models, where $emb can be bert-base-uncased, bert-large-uncased, and $task is one of the "task code" shown in the table.

./run_by_config_bert.sh $task $emb $output

Run GloVe/ELMo-based models, where $emb can be glove or elmo, and $task is one of the "task code" shown in the table.

./run_by_config.sh $task $emb $output

Train and Evaluate SpanRel models on other datasets

  1. Put the data in data/kairos with train, dev, and test sub-directory, each containing multiple .ann and .txt files.
  2. Build vocabulary: ./run_by_config_bert.sh kairos bert-base-uncased output/kairos_vocab.
  3. Modify the vocab variable in the kairos section of run_by_config_bert.sh to output/kairos_vocab/vocabulary.
  4. Train and evaluate: ./run_by_config_bert.sh kairos bert-base-uncased output/kairos_log.