Home

Awesome

emIOBUtils

A sequential labeling (IOB format) converter, corrector and evaluation package

emIOBUtils is the Python rewrite of CoreNLP's IOBUtils which is written in JAVA. It can take any (possibly ill-formed) IOB span input and convert/correct it according to the specified output style.

The program is useful to check whether the specified input contains valid spans or is ill-formed. Also, it can reduce or refine the possible labels for a specific purpose.

The supported formats are the following: iob[12], ioe[12], bio/iob, io, sbieo/iobes, noprefix.

The sequence evaluation metrics provided follows the naming convention of scikit-learn and contains all metrics from the current state of seqeval with a few new metrics introduced. For more complex evaluation we recommend using PyCM and scikit-learn

On IOB formats/styles

The documentation of the original class presents the idea very smoothly:

A 4-way representation of all entities, like S-PERS, B-PERS, I-PERS, E-PERS for a single word, beginning, internal, and end of an entity (IOBES or SBIEO); always marking the first word of an entity (IOB2 or BIO); only marking specially the beginning of non-first items of an entity sequence with B-PERS (IOB1); the reverse IOE1 and IOE2; IO where everything is I-tagged; and NOPREFIX, where no prefixes are written on category labels. The last two representations are deficient in not allowing adjacent entities of the same class to be represented, but nevertheless convenient. Note that the background label (e.g. O) is never given a prefix. This code is very specific to the particular CoNLL way of labelling classes for IOB-style encoding, but this notation is quite widespread. It will work on any of these styles of input. It will also recognize BILOU/IOBE1 format (B=B, I=I, L=E, O=O, U=S=1).

Requirements

Install on a local machine

Usage

It is recommended to use the program as the part of e-magyar language processing framework.

If all input columns are already existing one can use python3 -m emiobutils with the unified xtsv CLI API.

Mandatory CLI arguments

To use this library as a standalone tool the following CLI arguments must be supplied:

Available library functions

Conversion related:

Evaluation related:

License

This program is licensed under the GPL 3.0 license.

Acknowledgement

The authors gratefully acknowledge the efforts of CoreNLP developers to develop the algorithm and release their code under a free license.

We dedicate this library to all fellows whoever started to write such converters on their own.