Home

Awesome

This is a tool for labeling text documents. Slate supports annotation at different scales (spans of characters, tokens, and lines, or a document) and of different types (free text, labels, and links). This covers a range of tasks, such as Part-of-Speech tagging, Named Entity Recognition, Text Classification (including Sentiment Analysis), Discourse Structure, and more.

Why use this tool over the range of other text annotation tools out there?

Note - this repository is not for the "Segment and Link-based Annotation Tool, Enhanced", which can be found here and was first presented at LREC 2010. See 'Citing' below for additional notes on that work.

Installation

Two options:

1. Install with pip

pip install slate-nlp

Then run from any directory in one of two ways:

slate
python -m slate

2. Or download and run without installing

Either download as a zip file:

curl https://codeload.github.com/jkkummerfeld/slate/zip/master -o "slate.zip"
unzip slate.zip
cd slate-master

Or clone the repository:

git clone https://github.com/jkkummerfeld/slate
cd slate

Then run with either of:

python slate.py
./slate.py

To run from another directory, use:

python PATH_TO_SLATE/slate.py
PATH_TO_SLATE/slate.py

Requirements

The code requires only Python (2 or 3) and can be run out of the box. Your terminal must be at least 80 characters wide and 20 tall to use the tool.

Citing

If you use this tool in your work, please cite:

@InProceedings{acl19slate,
  title     = {SLATE: A Super-Lightweight Annotation Tool for Experts},
  author    = {Jonathan K. Kummerfeld},
  booktitle = {Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations},
  location  = {Florence, Italy},
  month     = {July},
  year      = {2019},
  pages     = {7--12},
  doi       = {10.18653/v1/P19-3002},
  url       = {https://aclweb.org/anthology/papers/P/P19/P19-3002/},
  software  = {https://jkk.name/slate},
}

While presenting this work at ACL I learned of another annotation tool called SLATE. That tool was first described in "Annotation Process Management Revisited", Kaplan et al. (LREC 2010) and then in "Slate - A Tool for Creating and Maintaining Annotated Corpora", Kaplan et al. (JLCL 2011). It takes a very different approach, using a web based interface that includes a suite of project management tools as well as annotation. The code it available at https://bitbucket.org/dainkaplan/slate/wiki/Home.

Getting Started

Note: if you used pip to install, reaplce python slate.py with slate everywhere below.

Run python slate.py <filename> to start annotating <filename> with labels over spans of tokens. The entire interface is contained in your terminal, there is no GUI. With command line arguments you can vary properties such as the type of annotation (labels or links) and scope of annotation (characters, tokens, lines, documents).

The input file should be plain text, organised however you like. Prepare the data with your favourite sentence splitting and/or tokenisation software (e.g., SpaCy). If you use Python 3 then unicode should be supported, but the code has not been tested extensively with non-English text (please share any issues!).

When you start the tool it displays a set of core commands by default. These are also specified below, along with additional commands.

The tool saves annotations in a separate file (<filename>.annotations by default, this can be varied with a file list as described below). Annotation files are formatted with one line per annotated item. The item is specified with a tuple of numbers. For labels, the item is followed by a hyphen and the list of labels. For links, there are two items on the line before the hyphen. For example, these are two annotation files, one for labels of token spans and the other for links between lines:

==> label.annotations <==
(2, 1) - label:a
((3, 5), (3, 8)) - label:a
(7, 8) - label:s label:a

==> link.annotations <==
13 0 - 
13 7 - 
16 7 - 

A few notes:

Tutorials

Included in this repository are a set of interactive tutorials that teach you how to use the tool from within the tool itself.

TaskCommand
Named Entity Recognition annotationpython slate.py tutorial/ner.md -t categorical -s token -o -c ner-book.config -l log.tutorial.ner.txt -sl -sm
Labelling spans of text in a documentpython slate.py tutorial/label.md -t categorical -s token -o -l log.tutorial.label.txt
Linking lines in a documentpython slate.py tutorial/link.md -t link -s line -o -l log.tutorial.link.txt

Example Workflow

This tool has already been used for two annotation efforts involving multiple annotators (Durrett et al., 2017 and Kummerfeld et al., 2018). Our workflow was as follows:

Comparing Annotations

The tool supports displaying annotations for the purpose of adjudicating disagreements. There are two steps involved. Specifically, you can request that a set of other annotation files be read. Then, whenever one of those annotation files includes something that your current adjudication does not, the text is shown in red.

Data list file creation

A data list file contains a series of lines in the format:

raw_file [output_file [cur_position [other_annotations]]]

For example, this line says there is a raw text file my-book.txt, that the adjudications should be saved in annotations-adjudicated.txt, that annotation should start at the very start of my-book.txt and that there are three existing annotations to be compared:

my-book.txt annotations-adjudicated.txt ((0, 0), (0, 0)) my-book.txt.annotations1 my-book.txt.annotations2 my-book.txt.annotations3

Note: you can have as many "other_annotation" files as you want.

Run slate with the data list file

Now run slate as follows:

python slate.py -d data-list-file [any other arguments]

Example

The tutorial folder contains two example data list files:

You can use them as follows:

cd tutorial/data
python ../../slate.py -d list_with_disagreements.category.txt -t categorical -s token

Efficiency Tip

You can save time by putting annotations that all annotators agreed on into the annotations-adjudicated.txt file. This bash pipeline will do that if you replace:

cat ANNOTATION_FILES | sort | uniq -c | awk -v count=N_FILES '$1 == count' | sed 's/^ *[0-9]* *//' > annotations-adjudicated.txt

Breaking this down, it does the following:

Detailed Usage Instructions

Invocation options

usage: slate.py [-h] [-d DATA_LIST [DATA_LIST ...]] [-t {categorical,link}]
                [-s {character,token,line,document}] [-c CONFIG_FILE] [-l LOG_PREFIX] [-ld]
                [-sh] [-sl] [-sp] [-sm] [-r] [-o] [-ps] [-pf] [--do-not-show-linked]
                [--alternate-comparisons]
                [data ...]

A tool for annotating text data.

positional arguments:
  data                  Files to be annotated

optional arguments:
  -h, --help            show this help message and exit
  -d DATA_LIST [DATA_LIST ...], --data-list DATA_LIST [DATA_LIST ...]
                        Files containing lists of files to be annotated
  -t {categorical,link}, --ann-type {categorical,link}
                        The type of annotation being done.
  -s {character,token,line,document}, --ann-scope {character,token,line,document}
                        The scope of annotation being done.
  -c CONFIG_FILE, --config-file CONFIG_FILE
                        A file containing configuration information.
  -l LOG_PREFIX, --log-prefix LOG_PREFIX
                        Prefix for logging files
  -ld, --log-debug      Provide detailed logging.
  -sh, --show-help      Show help on startup.
  -sl, --show-legend    Start with legend showing.
  -sp, --show-progress  Start with progress showing.
  -sm, --show-mark      Start with mark showing.
  -r, --readonly        Do not allow changes or save annotations.
  -o, --overwrite       If they exist already, read and overwrite output files.
  -ps, --prevent-self-links
                        Prevent an item from being linked to itself.
  -pf, --prevent-forward-links
                        Prevent a link from an item to one after it.
  --do-not-show-linked  Do not have a special color to indicate any linked token.
  --alternate-comparisons
                        Activate alternative way of showing different annotations (one colour
                        per set of markings, rather than counts).

You may also define arguments in a file and pass them in as follows:

python slate.py @arguments.txt

Keybindings

The tool shows files one at a time in plain text. Default commands are shown below.

Note: special keys such as ENTER and BACKSPACE may not work on non-OS-X operating systems. That is why in all places where they are used we have an alternative as well.

TypeKeyLabelling AffectLinking Affect
Movement<kbd>j</kbd> or <kbd></kbd>move to the leftmove selected item to the left
 <kbd>i</kbd> or <kbd></kbd>move up a linemove selected item up a line
 <kbd>o</kbd> or <kbd></kbd>move down a linemove selected item down a line
 <kbd>;</kbd> or <kbd></kbd>move to the rightmove selected item to the right
 <kbd>J</kbd> or [<kbd>Shift</kbd> + <kbd></kbd>]go to the start of the linemove linking item to the left
 <kbd>I</kbd> or [<kbd>Shift</kbd> + <kbd></kbd>]go to first linemove linking item up a line
 <kbd>O</kbd> or [<kbd>Shift</kbd> + <kbd></kbd>]go to last linemove linking item down a line
 <kbd>:</kbd> or [<kbd>Shift</kbd> + <kbd></kbd>]go to the end of the linemove linking item to the right
Edit Span<kbd>m</kbd>extend leftextend selected item left
 <kbd>k</kbd>contract left sidecontract selected item left
 <kbd>/</kbd>extend rightextend selected item right
 <kbd>l</kbd>contract right sidecontract selected item right
 <kbd>M</kbd>-extend linking item left
 <kbd>K</kbd>-contract linking item left
 <kbd>?</kbd>-extend linking item right
 <kbd>L</kbd>-contract linking item right
Label Annotation (default)<kbd>Space</kbd> then <kbd>a</kbd>[un]mark this item as a-
 <kbd>Space</kbd> then <kbd>s</kbd>[un]mark this item as s-
 <kbd>Space</kbd> then <kbd>d</kbd>[un]mark this item as d-
 <kbd>Space</kbd> then <kbd>v</kbd>[un]mark this item as v-
Link Annotation<kbd>d</kbd>-create a link and move right / down
 <kbd>D</kbd>-create a link
Either Annotation mode<kbd>u</kbd>undo annotation on this itemundo all annotations for the current item

Shared commands:

TypeModeKeyAffect
SearchingNormal<kbd>\</kbd>enter query editing mode
 Query<kbd>?</kbd> or <kbd>Enter</kbd>exit query editing mode
 Query<kbd>!</kbd> or <kbd>Backspace</kbd>delete last character in query
 Querycharacters except <kbd>?</kbd> and <kbd>!</kbd>add character to query
 Normal<kbd>p</kbd>go to previous match
 Normal<kbd>n</kbd>go to next match
 Normal<kbd>P</kbd>go to previous match for linking line
 Normal<kbd>N</kbd>go to next match for linking line
Assigning text labelsNormal<kbd>t</kbd>enter label editing mode
 Label<kbd>?</kbd> or <kbd>Enter</kbd>exit label editing mode and assign the label
 Label<kbd>!</kbd> or <kbd>Backspace</kbd>delete last character in label
 Labelcharacters except <kbd>?</kbd> and <kbd>!</kbd>add character to label
Saving, exiting, etcNormal<kbd>]</kbd>save and go to next file
 Normal<kbd>[</kbd>save and go to previous file
 Normal<kbd>q</kbd>save and quit
 Normal<kbd>s</kbd>save
 Normal<kbd>Q</kbd>quit
MiscNormal<kbd>#</kbd>toggle line numbers
 Normal<kbd>h</kbd>toggle help info (default on)
 Normal<kbd>{</kbd> or <kbd>PAGE-UP</kbd>shift view up 5 lines
 Normal<kbd>}</kbd> or <kbd>PAGE-DOWN</kbd>shift view down 5 lines
 Normal<kbd>></kbd> then <kbd>p</kbd>toggle showing progress through files
 Normal<kbd>></kbd> then <kbd>l</kbd>toggle showing legend for labels
 Normal<kbd>></kbd> then <kbd>m</kbd>toggle showing the mark on the current item

Misc

To annotate multiple files, specify more than one as an argument. For greater control, provide a list of files in a file specified with --data-list / -d. The list should be formatted as follows, where [] indicate optional values:

raw_file [annotation_file [starting_position [additional_annotation_files]]]

For example, these commands will create a file list, use it, then return to it later:

find . -name *txt > filenames_todo
./slate.py -d filenames_todo -l do_later
# ... do some work, then quit, go away, come back...
./slate.py -d do_later.todo -l do_even_later -o

Note, the -o flag is added so it will allow you to edit the annotations you have already created. Otherwise the system will complain that you are overwriting existing annotation files.

When the additional_annotation_files are included it activates an adjudication mode. By default, all annotations that appear in all additional files are added to the current annotations. Disagreements are coloured in the text, but will disappear once a decision is made (using the normal annotation commands).

Running in Windows

Users have reported a range of issues when trying to run slate:

Here are a few things to try that others have found helpful:

Customisation

Colours and keys are customisable. For labelling, the default is:

For linking, the default is:

Modifying the Code

Slate has a relatively small codebase (~2,200 lines) and is designed to make adding new functionality not too hard. The code is divided up as follows:

Logic for determining what colour goes where is split across two parts of the code. In data.py, the set of labels for an item is determined. In view.py, that set of labels is used to choose a suitable colour.

Adding a new command involves:

Changing the label set / Adding labels

The label set is defined in your config file (see an example config here).

See lines like this for label definitions:

Label:          a                         SPACE_a green

The format is:

Label:        <label>                    <command> <colour>

You can add / edit / remove these lines to define your own label scheme. For example, for NER you may want to do:

Label:          O                         SPACE_a green
Label:          LOC                       SPACE_s blue
Label:          PER                       SPACE_d red
Label:          ORG                       SPACE_f yellow
Label:          MISC                      SPACE_v magenta

For an example of a custom config file, see ner-book.config

The current set of available colours is: [green, blue, white, cyan, magenta, red, yellow]. Note that by default white is used for regular text and cyan is used for cases where multiple labels apply to the same content.

To define more colours, edit the top of slate/config.py. By varying both the text colour (foreground) and background colour you can achieve quite a range of variations. You can also define any RGB colour you want using the curses init_color function and the init_pair function.

Questions

If you have a question please either:

Contributions

If you find a bug in the code, please submit an issue, or even better, a pull request with a fix.

Here are some possible improvements:

Acknowledgments

This tool is based in part upon work supported by IBM under contract 4915012629, and by ONR under MURI grant N000140911081. Any opinions, findings, conclusions or recommendations expressed are those of the authors and do not necessarily reflect the views of IBM.