Awesome
German Preprocessing
Preprocess German texts to do some serious natural-language processing.
- clean texts
- remove stopwords (as defined by spaCy)
- lemmatize
- lower-case, and remove all punctions, digits are replaced with "0"
Installation
pip install german
Usage
from german import preprocess
preprocess(['Johannes war einer von vielen guten Schülern.', 'Julia trinkt gern Tee.'], remove_stop=True)
# ['johannes gut schüler', 'julia trinken tee']
License
MIT.
Sponsoring
This work was created as part of a project that was funded by the German Federal Ministry of Education and Research.
<img src="./bmbf_funded.svg">