Awesome
emDepPy
A wrapper and REST API implemented in Python for emDep (Bohnet parser a.k.a. Mate Tools).
WARNING: This module is not thread-safe! One can not load multiple models simultaneously!
WARNING: This wrapper is only compatible with JAVA 11 or higher!
Requirements
- (Included in this repository) Mate Tools parser compiled (stripped from Magyarlánc 3.0)
- (Included in this repository) Modelfile for the parser (stripped from e-magyar)
- Java JRE as in Aptfile (for building dependencies)
- Python 3 (tested with 3.6)
- Pip to install the additional requirements in requirements.txt
Install on local machine
- Install git-lfs
git-lfs install
- Clone the repository:
git clone https://github.com/dlt-rilmta/emdeppy
(It should clone the model file also!) sudo apt install `cat Aptfile`
make build
sudo pip3 install dist/*.whl
- Use from Python
Usage
It is recommended to use the program as the part of e-magyar language processing framework.
If all columns are already exists one can use python3 -m emdeppy
with the unified xtsv CLI API.
When --maxlen [n: Int > 0]
is supplied only sentences with at least n tokens are parsed longer ones get _
for all fields.
Train
The training is currently available from JAVA CLI only:
cat SzegedDep/*.conll-2009 | awk -F$'\t' -v OFS=$'\t' '{if ($0 != "") print $0,"_","_","_","_","_","_","_"; else print $0}{}' > train_corpus.txt
empdejava -Xmx2G -classpath ./emdeppy/anna-3.61.jar is2.parser.Parser -model szk.mate.new.model -train train_corpus.txt
For more training parameters see the documentation or the source code.
License
This Python wrapper is licensed under the LGPL 3.0 license. The model and the included jar file have their own licenses.