Home

Awesome

Descritption

This repository contains a set of scripts to build a ready-to-use Juman++ model for Jumandic.

Prerequrements

Recommended

How to Use

Run the configuration script: python3 configure.py. It will prompt for the location of Mainichi Shinbun texts.

After that run make nornn for training a model without RNN component. make rnn produces the model with RNN component. The models will be inside the bld/model folder.

Adding your words to the model

It is possible to add your words to the model. To do it:

  1. Perform the configuration as described above: python3 configure.py
  2. Fetch the repositories make repo.
  3. Go into bld/repos/jumandic folder, it is a local clone of JumanDIC repository.
  4. Create a new file with the .dic extension in the userdic folder of the bld/repos/jumandic folder.
  5. Put your words into that file, in JUMAN dictionary format (refer to other files for example).
  6. Execute make clean-dic if you have already built a Juman++ model.
  7. Build your model as shown above.

If the built model does not contain your words, ensure that the binary dictionary was rebuilt after adding new words.