Home

Awesome

Wisesight Sentiment Corpus

DOI

ข้อความภาษาไทยจากสื่อสังคมออนไลน์ พร้อมกับป้ายกำกับความรู้สึก (บวก, กลางๆ, ลบ, คำถาม) รวม 26,737 ข้อความ เผยแพร่เป็นสมบัติสาธารณะ ภายใต้สัญญาอนุญาต Creative Commons Zero v1.0 Universal

Social media messages in Thai language with sentiment label (positive, neutral, negative, question). Released to public domain under Creative Commons Zero v1.0 Universal license.

Last update: 2019-03-31

For wisesight-160 and wisesight-1000, which are samples from this corpus in a tokenized form, see https://github.com/PyThaiNLP/wisesight-sentiment/tree/master/word-tokenization

For data exploration and classification examples, see Thai Text Classification Benchmarks.

Source

Corpus file structure

Personal data

Sentiment value annotation methodology

Copyright and Disclaimer

Citation

Please cite the following if you make use of the dataset:

Arthit Suriyawongkul, Ekapol Chuangsuwanich, Pattarawat Chormai, and Charin Polpanumas. 2019. PyThaiNLP/wisesight-sentiment: First release. September.

BibTeX:

@software{bact_2019_3457447,
  author       = {Suriyawongkul, Arthit and
                  Chuangsuwanich, Ekapol and
                  Chormai, Pattarawat and
                  Polpanumas, Charin},
  title        = {PyThaiNLP/wisesight-sentiment: First release},
  month        = sep,
  year         = 2019,
  publisher    = {Zenodo},
  version      = {v1.0},
  doi          = {10.5281/zenodo.3457447},
  url          = {https://doi.org/10.5281/zenodo.3457447}
}

Acknowledgement

Thanks PyThaiNLP community, Kitsuchart Pasupa (Faculty of Information Technology, King Mongkut's Institute of Technology Ladkrabang), and Ekapol Chuangsuwanich (Faculty of Engineering, Chulalongkorn University) for advice. The original Kaggle competition, using the first version of this corpus, can be found at https://www.kaggle.com/c/wisesight-sentiment/