Awesome
Toxicity in Thai Tweet Corpus
Annotated Corpus
Each row contains label, annotation ratio between toxic/nontoxic (using 3 annotators) and tweet id as the example:
1[tab][3/0][tab]tweet_id
Labels are following items:
- 1: Toxic
- 0: Non-Toxic
Toxic keywords
These keywords are the 44 keywords that we used to collect the tweets via Twitter Search API.
Each row contains toxic keyword and its meaning as the example:Thai toxic word[tab]original meaning/toxic meaning
.
Publication
In Proceedings of the Second Workshop on Text Analytics for Cybersecurity and Online Safety 2018 (to appear).
Demo application
http://cl.sd.tmu.ac.jp/thaitoxicity/
License
This project is licensed under the terms of the Creative Commons license.