


Source codes of our EMNLP2017 paper Joint Embeddings of Chinese Words, Characters, and Fine-grained Subcharacter Components


You need to prepare a training corpus and the Chinese subcharacter radicals or components.

Model Training

侩 亻 人 云
侨 亻 乔
侧 亻 贝 刂
侦 亻 卜 贝

Model Evaluation

Two Chinese word similarity datasets 240.txt and 297.txt and one Chinese analogy dataset analogy.txt in JWE/evaluation folder are provided by (Chen et al., IJCAI, 2015).

cd JWE/src, then