Home

Awesome

german2vec

Overview

This repository contains documentation and code for building a German Language Model using the fastai library and applying it on a variety of NLP tasks such as text classification. The language model is based on 3-layer AWD-LSTM that was first published by Salesforce Research.

The backbone of the model is trained on the German Wikipedia Corpus and uses transfer learning to apply it to on text classification tasks (as described in Universal Language Model Fine-tuning for Text Classification).

Update:

A pre-trained Language Model using the German Wikipedia Corpus is available from this website: https://lernapparat.de/german-lm/. Thanks for sharing, Thomas!

Project structure

NotebookTask
sb-10k-use_pretrained_language_model.ipynbclassifier for SB-10k Corpus (built on pre-trained language model)
sb-10k_small_wikipedia_corpus.ipynbclassifier for SB-10k Corpus (built on self-trained language model using German Wikipedia)
sb-10k-data_preprocessing.ipynbdata pre-processing steps for SB-10k: German Sentiment Corpus

TODO

Future research

to be updated

Contact

For more information, please feel free to contact me via e-mail (bachfischer.matthias@googlemail.com)