Home

Awesome

Experimental Norwegian (Bokmål) language model for Spacy

This model is based of the Norwegian Universal dependency dataset that can be found here:

https://github.com/UniversalDependencies/UD_Norwegian-Bokmaal

Command used to train the model:

batch_from=16 batch_to=64 python -m spacy train nb model_out no_bokmaal-ud-train.json no_bokmaal-ud-dev.json -n 30

There is probably much room for improvement on this model. However in regards to tagging the model seems to perform pretty well.

Iteration 7 seemed to be working best so this is the one packaged here.

To get the same results as show here please use the updated Norwegian language package for Spacy. It should now be a part of the master branch, but the Pull request can be found here: https://github.com/explosion/spaCy/pull/1882

Installation

To install the package use this command:

pip install https://github.com/ohenrik/nb_dep_ud_sm/raw/master/nb_dep_ud_sm-0.0.1/dist/nb_dep_ud_sm-0.0.1.tar.gz

Usage

import spacy
nb = spacy.load("nb_dep_ud_sm")

doc = nb("Det er kaldt på vinteren i Norge.")

Training results:

Itn.P.LossN.LossUASNER P.NER R.NER F.Tag %Token %nana
0500.9620.00083,670.0000.0000.00093,269100.0003542.90.0
186.5540.00086,380.0000.0000.00094,396100.0003767.60.0
235.3510.00087,070.0000.0000.00094,762100.0003611.10.0
321.7690.00087,990.0000.0000.00094,839100.0003779.80.0
419.4900.00088,260.0000.0000.00095,02100.0003565.90.0
517.7300.00088,480.0000.0000.00095,084100.0003421.00.0
616.1410.00088,770.0000.0000.00095,042100.0003533.30.0
714.9060.00088,720.0000.0000.00095,139100.0003572.30.0
813.6440.00088,760.0000.0000.00095,042100.0003585.80.0
912.9090.00088,720.0000.0000.00095,125100.0003694.20.0
1012.1940.00088,720.0000.0000.00095,075100.0003618.30.0
1111.4350.00088,650.0000.0000.00095,042100.0003738.20.0
1210.9500.00088,670.0000.0000.00094,754100.0003909.90.0
1310.3250.00088,850.0000.0000.00047,879100.0003673.90.0
149.7930.00088,880.0000.0000.00042,063100.0003758.40.0
159.4560.00088,770.0000.0000.00043,68100.0003497.10.0
168.9670.00088,690.0000.0000.00045,06100.0003514.90.0
178.4930.00088,880.0000.0000.00046,537100.0003632.70.0
188.1090.00088,760.0000.0000.00047,249100.0003837.60.0
197.7950.00088,730.0000.0000.00047,485100.0003473.20.0
207.5730.00088,810.0000.0000.00047,579100.0003482.80.0
217.1310.00088,820.0000.0000.00047,282100.0003327.10.0
227.0530.00088,870.0000.0000.00046,916100.0003576.00.0
236.7360.00088,610.0000.0000.00046,394100.0003223.60.0
246.4590.00088,830.0000.0000.00045,841100.0003523.70.0
256.3640.00088,670.0000.0000.00045,423100.0003163.70.0
266.0800.00088,800.0000.0000.00044,959100.0003497.20.0
275.9840.00088,770.0000.0000.00044,56100.0003642.30.0
285.7240.00088,990.0000.0000.00044,249100.0003467.40.0
295.6200.00088,970.0000.0000.00043,895100.0003628.40.0

Not an official model

This is not yet an official spacy model