Awesome

NNSegmentation

NNSegmentation is a package for Word Segmentation using neural networks based on package LibN3L. It includes different combination of Neural network architectures (TNN, RNN, GatedNN, LSTM and GRNN) with Objective function(Softmax, CRF Max-Margin, CRF Maximum Likelihood). It also provides the capability of combination of Sparse feature along with above models. In addition, this package can easily support various user-defined neural network structures.

Performance

Please read Table 4 in LibN3L: A lightweight Package for Neural NLP.

Compile

Download LibN3L library and compile it.
Open CMakeLists.txt and change "../LibN3L/" into the directory of your LibN3L package.

cmake .
make

Example

This example shows how to train three Chinese word segmentation models for the pku corpus of the Sighan Bakeoff 2005 dataset.
These models are

SparseCRFMMLabeler which only considers the sparse features and works like a CRF model
LSTMCRFMMLabeler which only uses neural embeddings as input and employs CRF Maximum Likelihood as training objective.
SparseLSTMCRFMMLabeler which supports both neural embeddings and sparse features and also employs CRF Maximum Likelihood as training objective.

This example data contains

Sparse Features "train.feats", "dev.feats" and "test.feats". The training features and dev features are extracted only from a subset of the pku corpus.
Character Unigram Embedding "char.vec"
Character Bigram Embedding "bi.vec"
Character Trigram Embedding "tri.vec"
Parameter Setting File "sparse" for SparseCRFMMLabeler, "lstm" for LSTMCRFMMLabeler and "sparselstm" for SparseLSTMCRFMMLabeler.

For more details about the example, please read the example "readme.md".