Awesome
emmorph2conll
The script converts the output tag of emMorph morphological analyzer to the corresponding tag of a version Szeged Treebank.
What's in this repo?
- the main script of the converter:
converter.py
- auxiliary files in folder
converterdata
- license
- this readme
The tagsets :hungary:
A detailed description of the tagsets is available here.
emMorph
emMorph is the current morphological analyzer for Hungarian and it is integrated into the e-magyar language processing toolchain. The list of emMorph tags is from here.
CoNLL
What we call here CoNLL is a modified version of the morphosyntactic tagset of MULTEXT transformed into a feature-value pair structure. This modified tagset is an annotation scheme for a version of the largest fully manually annotated corpus of Hungarian, Szeged Treebank.
How to use the converter?
- standard input: token, lemma, emmorph tag separated by tab
- standard output: conll tag
Dependencies
Python3
License
GNU General Public License v3.0