Home

Awesome

Multi-Level Tagger

This is the code associated with this paper:

Barrett, Maria, Joachim Bingel, Nora Hollenstein, Marek Rei and Anders Søgaard (2018). “Sequence classification with Human Attention.” In: The SIGNLL Conference on Computational Natural Language Learning (CoNLL).

The code is based on the code from https://github.com/marekrei/mltagger

Run experiment with

python experiment.py config_file.conf data_config_file.conf

The first of these config files defines tasks and hyperparameters, while the second lists paths for the datasets for each task. Examples of the config files can be found in conf.

Data format

The training and test data is expected in standard CoNLL-type tab-separated format. One word per line, separate column for token and label, empty line between sentences.

For error detection, this would be something like:

I       -
saws    +
the     -
show    -

Sentence-level labels are optionally marked on an extra line preceding the first word:

-
I       -
saws    +
the     -
show    -

+
Did     -
you     +
see     -
it      -
?       -

Any word with default_label gets label 0, any word with other labels gets assigned 1. Sentences with annotations only at the sentence-level mark each word with the ignore_label.

X
I       _
saws    _
the     _
show    _

Configuration

Edit the values in config.conf as needed: