Home

Awesome

Thai word segmentation with bi-directional RNN

This is code for preprocessing data, training model and inferring word segment boundaries of Thai text with bi-directional recurrent neural network. The model provides precision of 98.94%, recall of 99.28% and F1 score of 99.11%. Please see the blog post for the detailed description of the model.

Requirements

Files

Note that the InterBEST 2009 corpus is not included, but can be downloaded from the NECTEC website.

Usage

To try the prediction demo, run python3 predict_example.py. To preprocess the data, train the model and save the model, put the data files under data directory and then run python3 preprocess.py and python3 train.py.

Bug fixes and updates

Contributors

License

MIT

Copyright (c) Sertis Co., Ltd., 2019