Awesome

Thai Text Classification Benchmarks

We provide 4 datasets for Thai text classification in different styles, objectives, and number of labels. We also created some preliminary benchmarks using fastText, linear models (linearSVC and logistic regression), and thai2fit's implementation of ULMFit.

prachathai-67k, truevoice-intent, and all code in this repository are released under Apache License 2.0 by pyThaiNLP. wisesight-sentiment is released to public domain, using Creative Commons Zero v1.0 Universal license, by Wisesight. wongnai-corpus is released under GNU Lesser General Public License v3.0 by Wongnai.

Dataset Description

Datasets	Style	Objective	Labels	Size
prachathai-67k: body_text	Formal (online newspapers), News	Topic	12	67k
truevoice-intent: destination	Informal (call center transcription), Customer service	Intent	7	16k
wisesight-sentiment	Informal (social media), Conversation/opinion	Sentiment	4	28k
wongnai-corpus	Informal (review site), Restuarant review	Sentiment	5	40k

prachathai-67k: body_text

We benchmark prachathai-67k by using body_text as text features and construct a 12-label multi-label classification. The performance is measured by macro-averaged accuracy and F1 score. Codes can be run to confirm performance at this notebook. We also provide performance metrics by class in the notebook.

model	macro-accuracy	macro-F1
fastText	0.9302	0.5529
LinearSVC	0.513277	0.552801
ULMFit	0.948737	0.744875
USE	0.856091	0.696172

truevoice-intent: destination

We benchmark truevoice-intent by using destination as target and construct a 7-class multi-class classification. The performance is measured by micro-averaged and macro-averaged accuracy and F1 score. Codes can be run to confirm performance at this notebook. We also provide performance metrics by class in the notebook.

model	macro-accuracy	micro-accuracy	macro-F1	micro-F1
LinearSVC	0.957806	0.95747712	0.869411	0.85116993
ULMFit	0.955066	0.84273111	0.852149	0.84273111
BERT	0.8921	0.85	0.87	0.85
USE	0.943559	0.94355855	0.787686	0.802455

wisesight-sentiment

Performance of wisesight-sentiment is based on the test set of WISESIGHT Sentiment Analysis. Codes can be run to confirm performance at this notebook.

Disclaimer Note that the labels are obtained manually and are prone to errors so if you are planning to apply the models in the benchmark for real-world applications, be sure to benchmark it with your own dataset.

Model	Public Accuracy	Private Accuracy
Logistic Regression	0.72781	0.7499
FastText	0.63144	0.6131
ULMFit	0.71259	0.74194
ULMFit Semi-supervised	0.73119	0.75859
ULMFit Semi-supervised Repeated One Time	0.73372	0.75968
USE	0.63987*

Done after competition with a test set that was cleaned from 3946 rows to 2674 rows

wongnai-corpus

Performance of wongnai-corpus is based on the test set of Wongnai Challenge: Review Rating Prediction. Codes can be run to confirm performance at this notebook.

Model	Public Micro-F1	Private Micro-F1
ULMFit Knight	0.61109	0.62580
ULMFit	0.59313	0.60322
fastText	0.5145	0.5109
LinearSVC	0.5022	0.4976
Kaggle Score	0.59139	0.58139
BERT	0.56612	0.57057
USE	0.42688	0.41031

BibTeX

@software{cstorm125_2020_3852912,
  author       = {cstorm125 and
                  lukkiddd},
  title        = {PyThaiNLP/classification-benchmarks: v0.1-alpha},
  month        = may,
  year         = 2020,
  publisher    = {Zenodo},
  version      = {v0.1-alpha},
  doi          = {10.5281/zenodo.3852912},
  url          = {https://doi.org/10.5281/zenodo.3852912}
}

Acknowledgements

Ekapol Chuangsuwanich for pioneering wongnai-corpus, wisesight-sentiment, and truevoice-intent for his NLP classes at Chulalongkorn University.
@lukkiddd for data exploration and linear model codes.