Awesome
Norwegian NLP Resources
<a href="https://www.facebook.com/groups/nlpnorway/"> <img src="http://cdn.web64.com/nlp-norway/nlp-norway-github-cover.jpg"> </a> A work-in-progress list of useful NLP resources for Norwegian.Please let us know if there are useful NLP resources we might have missed!
Contact me at olav@web64.com
Facebook Group
Join our Facebook Group: https://www.facebook.com/groups/nlpnorway/
Open Source Libraries
Libraries with support for the Norwegian language
Spacy
- https://spacy.io/models/nb - Official support for Norwegian from Spacy(2.2.0)
- https://github.com/web64/spacy-norwegian - Train norwegian models for Spacy
- https://github.com/jarib/spacy-nb - Scripts to build a Norwegian model for spacy
- https://github.com/ohenrik/nb_news_ud_sm - Experimental Norwegian (Bokmål) language model for Spacy (Including NER)
- https://github.com/ohenrik/nb_dep_ud_sm - Experimental Norwegian (Bokmål) language model for Spacy
- https://github.com/navikt/ai-lab-spacy-bokmaal - Norwegian model for spaCy
BERT
- https://github.com/NBAiLab/notram - NoTraM - Norwegian Transformer Mode
- http://wiki.nlpl.eu/Vectors/norlm/norbert - NorBERT: Bidirectional Encoder Representations from Transformers
- https://github.com/botxo/nordic_bert - Nordic BERT: Norwegian Model: (Trained on 4.5gb text)
NLTK
- Teaching NLTK Norwegian - Master thesis by Bo Bjerke (PDF)
Models
- https://github.com/explosion/spacy-models/releases/tag/nb_core_news_sm-2.2.0 - Pretrained statistical models for Norwegian Bokmål
- https://github.com/ljos/navnkjenner - Named-Entity Recognition for Norwegian Bokmål and Nynorsk
- https://github.com/HIT-SCIR/ELMoForManyLangs - Pre-trained ELMo Representations
- https://github.com/ltgoslo/norec-baselines - NoReC baseline models, trained on the NoReC dataset.
- https://github.com/tensorflow/models/blob/master/syntaxnet/g3doc/universal.md - Syntaxnet models
- https://github.com/andrely/Norwegian-NLP-models - 2013
- https://github.com/emanlapponi/norlem-norwegian-lemmatizer - Lemmatizer for Norwegian that uses lexical and contextual information from the Norwegian Dependency Treebank (NDT)
- https://stanfordnlp.github.io/stanfordnlp/installation_download.html#human-languages-supported-by-stanfordnlp - StanfordNLP Pretrained models: Bokmål, Nynorsk, NynorskLIA
- https://github.com/mollerhoj/Scandinavian-ULMFiT - The weights for the embedding layer of a Scandinavian UMLFiT language models
Word Vectors
- http://vectors.nlpl.eu/repository/ - NLPL word embeddings repository
- https://github.com/bheinzerling/bpemb - GloVe word vectors based on Byte-Pair Encoding (BPE)
- https://github.com/Kyubyong/wordvectors - Word2Vec & fastText word vectors for bokmål and nynorsk.
- https://fasttext.cc/docs/en/crawl-vectors.html - fastText word vectors trained on common crawl and wikipedia.
Norwegian specific libraries
- https://github.com/textlab/mtag - The Oslo-Bergen Multitagger for Norwegian Bokmål and Nynorsk (python)
- https://github.com/ljos/anna_lyse - Language parser for Norwegian Bokmål and Nynorsk
- https://github.com/petterhh/ndt-tools - Norwegian Dependency Treebank(NDT) Tools
- https://github.com/ljos/egennavn - Named-entity chunker for Norwegian
- https://github.com/noklesta/The-Oslo-Bergen-Tagger - The Oslo Bergen Tagger
- https://github.com/draperunner/obt - Python library for The Oslo-Bergen Tagger
Universal Dependencies
- http://universaldependencies.org/ - Bokmål, Nynorsk, NynorskLIA
- UD_Norwegian-Bokmaal
- UD_Norwegian-Nynorsk
- UD_Norwegian-NynorskLIA
- Joint UD Parsing of Norwegian Bokmål and Nynorsk
Data & Corpus
- https://www.nb.no/sprakbanken/repositorium#ticketsfrom?lang=en&query=alle&tokens=&from=1&size=12&collection=sbr (Språkbankens ressurskatalog) Norwegian N-grams, lexicons, news corpus.
- https://github.com/ltgoslo/norec - NoReC: The Norwegian Review Corpu
- https://github.com/ltgoslo/talk-of-norway - Talk of Norway (ToN) dataset, a collection of Norwegian parliament speeches from 1998 to 2016
- https://github.com/stopwords-iso/stopwords-no - Norwegian stopwords in JSON or txt format
- https://github.com/ltgoslo/norne - NORwegian Named Entities
- https://www.sketchengine.eu/notenten-norwegian-corpus/ - noTenTen: Corpus of the Norwegian Web
- https://github.com/unhammer/fugeord - Fugeord
Sentiment Analysis for Norwegian Text
- https://www.usit.uio.no/om/organisasjon/itf/ds/faglig/seminarer/spraak-teknologi-betydning/sant.pdf (PDF) SANT: Sentiment Analysis for Norwegian Text
- http://www.mn.uio.no/ifi/english/research/projects/sant/index.html
- https://github.com/ltgoslo/norsentlex - NorSentLex: Norwegian sentiment lexicon of positive and negative words
- https://github.com/olavski/afinn/blob/master/afinn/data/AFINN-no-165.txt - Work-in-progress AFINN Norwegian sentiment lexicon
- https://github.com/web64/norec-fasttext - Train NoReC FastText Sentiment Analysis models
Machine Translation
Apertium
Main library: https://github.com/apertium/apertium-python
Language model:
- https://github.com/apertium/apertium-nno-nob
- https://github.com/apertium/apertium-nno
- https://github.com/apertium/apertium-nob
English-Norwegian parallel corpus
Commercial APIs
- repustate.com Norwegian Sentiment Analysis
- orbit.ai Text generation, Entity Extraction
- tagbox.ai Automated geotagging
- lexalytics.com Sentiment analysis
- monkeylearn.com Text Classification
- tisane.ai Sentiment analysis & topics detection
- fairhair.ai Web data & information extraction
- textoptimizer.com User intent and topic extraction
Dictionaries
- LibreOffice - no
- dictionary-nb Norwegian Bokmål spelling dictionary in UTF-8.
- dictionary-nn Norwegian Nynorsk spelling dictionary in UTF-8..
Papers
- An automatic analysis of Norwegian compounds
- Evaluating Semantic Vectors for Norwegian
- Joint UD Parsing of Norwegian Bokmål and Nynorsk
Related Resources
- Saami language technology
- Translation Memory
- DaNLP Repository for NLP resources for the Danish Language
- GitHub Topic: norwegian
- GitHub Topic: norsk
- GitHub Topic: nynorsk
- GitHub Topic: bokmal