Home

Awesome

unidic-py

This is a version of UniDic for Contemporary Written Japanese packaged for use with pip.

Currently it supports 3.1.0, the latest version of UniDic. Note this will take up 770MB on disk after install. If you want a small package, try unidic-lite.

The data for this dictionary is hosted as part of the AWS Open Data Sponsorship Program. You can read the announcement here.

After installing via pip, you need to download the dictionary using the following command:

python -m unidic download

With fugashi or mecab-python3 unidic will be used automatically when installed, though if you want you can manually pass the MeCab arguments:

import fugashi
import unidic
tagger = fugashi.Tagger('-d "{}"'.format(unidic.DICDIR))
# that's it!

Differences from the Official UniDic Release

This has a few changes from the official UniDic release to make it easier to use.

See the extras directory for details on how to replicate the build process.

Fields

Here is a list of fields included in this edition of UniDic. For more information see the UniDic FAQ, though not all fields are included. For fields in the UniDic FAQ the name given there is included. Als orefer to the description of the field hierarchy for details.

Fields which are not applicable are usually marked with an asterisk (*).

<details> <summary>Type and POS fields in unidic-cwj-202302</summary> <pre> type,pos1,pos2,pos3,pos4 人名,名詞,固有名詞,人名,一般 他,感動詞,フィラー,*,* 他,感動詞,一般,*,* 他,接続詞,*,*,* 体,代名詞,*,*,* 体,名詞,助動詞語幹,*,* 体,名詞,普通名詞,サ変可能,* 体,名詞,普通名詞,サ変形状詞可能,* 体,名詞,普通名詞,一般,* 体,名詞,普通名詞,副詞可能,* 体,名詞,普通名詞,助数詞可能,* 体,名詞,普通名詞,形状詞可能,* 係助,助詞,係助詞,*,* 副助,助詞,副助詞,*,* 助動,助動詞,*,*,* 助動,形状詞,助動詞語幹,*,* 助数,接尾辞,名詞的,助数詞,* 名,名詞,固有名詞,人名,名 固有名,名詞,固有名詞,一般,* 国,名詞,固有名詞,地名,国 地名,名詞,固有名詞,地名,一般 姓,名詞,固有名詞,人名,姓 接助,助詞,接続助詞,*,* 接尾体,接尾辞,名詞的,サ変可能,* 接尾体,接尾辞,名詞的,一般,* 接尾体,接尾辞,名詞的,副詞可能,* 接尾用,接尾辞,動詞的,*,* 接尾相,接尾辞,形容詞的,*,* 接尾相,接尾辞,形状詞的,*,* 接頭,接頭辞,*,*,* 数,名詞,数詞,*,* 格助,助詞,格助詞,*,* 準助,助詞,準体助詞,*,* 用,動詞,一般,*,* 用,動詞,非自立可能,*,* 相,副詞,*,*,* 相,形容詞,一般,*,* 相,形容詞,非自立可能,*,* 相,形状詞,タリ,*,* 相,形状詞,一般,*,* 相,連体詞,*,*,* 終助,助詞,終助詞,*,* 補助,空白,*,*,* 補助,補助記号,一般,*,* 補助,補助記号,句点,*,* 補助,補助記号,括弧閉,*,* 補助,補助記号,括弧開,*,* 補助,補助記号,読点,*,* 補助,補助記号,AA,一般,* 補助,補助記号,AA,顔文字,* 記号,記号,一般,*,* 記号,記号,文字,*,* </pre> </details>

License

The modern Japanese UniDic is available under the GPL, LGPL, or BSD license, see here. UniDic is developed by NINJAL, the National Institute for Japanese Language and Linguistics. UniDic is copyrighted by the UniDic Consortium and is distributed here under the terms of the BSD License.

The code in this repository is not written or maintained by NINJAL. The code is available under the MIT or WTFPL License, as you prefer.