Awesome
Annotace
Annotace is a text analysis service used e.g. by TermIt and its web annotation plugin.
How to run it?
- Install Java 11
- Run
./gradlew bootRun
(on Linux/WSL) orgradlew.bat bootRun
on Windows
Lemmatizers
Annotace supports two lemmatizer implementations:
- Spark-based lemmatizer is more suitable for annotation of English texts. This is the default lemmatizer
- MorphoDiTa-based lemmatizer is more suitable for annotation of Czech or Slovak texts. It comes in two variants:
- JNI-based - runs locally using the MorphoDiTa library itself
- Service-based - invokes a remote annotation service (needs to be configured)
Setup
Spark-based Annotace setup does not require any additional configuration or files. Either run it directly ./gradlew bootRun
or use Docker. There is an image published at GitHub package registry.
Running Annotace with MorphoDiTa is a bit more complicated.
Annotace with MorphoDiTa Locally
- Download the MorphoDiTa ZIP archive and extract it.
- Find a file with JNI bindings corresponding to your platform in the extracted directory. For 64-bit Linux the file is
morphodita-1.9.2-bin/bin-linux64/java/libmorphodita_java.so
. - Set path to the directory containing this file as
java.library.path
environment variable name. - Provide mapping of taggers (language models) to Annotace. Either by editing
application.yml
before build or by passing them as environment variables. - Run Annotace with the MorphoDiTa lemmatizer by setting
ANNOTACE_LEMMATIZER
tomorphodita-jni
.
A complete command line example would be:
ANNOTACE_LEMMATIZER=morphodita-jni ANNOTACE_MORPHODITA_TAGGERS_CS=/opt/annotace/lib/czech-morfflex2.0-pdtc1.0-220710.tagger LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/annotace/lib/morphodita-1.9.2-bin/bin-linux64/java ./gradlew bootRun
Annotace with MorphoDiTa in Docker
- Download the MorphoDiTa ZIP archive.
- Set
MORPHODITA_ZIP
indocker-compose-morphodita.yml
to path to the downloaded MorphoDiTa ZIP file. - Download and extract taggers (language models). Put them into a single directory.
- Set
MORPHODITA_TAGGERS
indocker-compose-morphodita.yml
to path to the taggers' directory. - Run
docker compose -f docker-compose-morphodita.yml up -d --build
to build and start Annotace wih MorphoDiTa.
License
Annotace is licensed under GPL v3.0, Spark and MorphoDiTa are distributed under their respective licenses.