Awesome
Description
This directory contains the data of the Potsdam Twitter Sentiment
Corpus (ISLRN 714-621-985-491-3). To open the files of this
corpus, you need to download and launch
MMAX2—a freely distributed
annotation tool—and then select one of the *.mmax projects from the
directories corpus/annotator-1/
or corpus/annotator-2/
.
Folder Structure
The folders of this project are structured as follows:
-
corpus/
– directory containing corpus files;annotator1/
– directory containing MMAX projects for the first annotator;markables/
– directory containing annotation files for the first annotator;
annotator2/
– directory containing MMAX projects for the second annotator;markables/
– directory containing annotation files for the second annotator;
basedata/
andsource/
– original corpus tokenization;custom/
,scheme/
, andstyle/
– auxiliary MMAX2 data;
-
docs/
– directory containing annotation guidelines and other accompanying documents; -
scripts/
– directory containing scripts that were used to process corpus data;examples/
– directory containing examples of input files for the scripts;align.py
– auxiliary module used for annotation alignment;alt_fio.py
– auxiliary module for AWK-like input/output operations;conll.py
– auxiliary module for handling CONLL sentences;measure_corpus_agreement.py
– script for measuring corpus agreement;merge_conll_mmax.py
– script for aligning annotation from the corpus with the automatically processed CONLL data;
You can see the examples of invocations in the script files or by just
typing --help
to see their usage.
Note
<span style="color:red">I strongly recommend using the annotation of annotator-2 on the branch eexpression-revision
(run git checkout eexpression-revision
after cloning this project).</span>