Awesome

Descritption

This repository contains a set of scripts to build a ready-to-use Juman++ model for Jumandic.

Prerequrements

Unix environment (on Windows use WSL or MSYS2/MinGW64)
Juman++ build environment
Python 3.6+
Ruby
Perl
Configured ssh authorization for github (we will clone several repositories via ssh)
32 GB of RAM

How to Use

Run the configuration script: python3 configure.py. It will prompt for the location of Mainichi Shinbun texts.

After that run make nornn for training a model without RNN component. make rnn produces the model with RNN component. The models will be inside the bld/model folder.

Adding your words to the model

It is possible to add your words to the model. To do it:

Perform the configuration as described above: python3 configure.py
Fetch the repositories make repo.
Go into bld/repos/jumandic folder, it is a local clone of JumanDIC repository.
Create a new file with the .dic extension in the userdic folder of the bld/repos/jumandic folder.
Put your words into that file, in JUMAN dictionary format (refer to other files for example).
Execute make clean-dic if you have already built a Juman++ model.
Build your model as shown above.

If the built model does not contain your words, ensure that the binary dictionary was rebuilt after adding new words.

Awesome

Descritption

Prerequrements

Recommended

How to Use

Adding your words to the model