Home

Awesome

Toxicity in Thai Tweet Corpus

License: CC BY-NC 4.0

Annotated Corpus

Each row contains label, annotation ratio between toxic/nontoxic (using 3 annotators) and tweet id as the example:

1[tab][3/0][tab]tweet_id

Labels are following items:

Toxic keywords

These keywords are the 44 keywords that we used to collect the tweets via Twitter Search API. Each row contains toxic keyword and its meaning as the example:Thai toxic word[tab]original meaning/toxic meaning.

Publication

In Proceedings of the Second Workshop on Text Analytics for Cybersecurity and Online Safety 2018 (to appear).

Demo application

http://cl.sd.tmu.ac.jp/thaitoxicity/

License

This project is licensed under the terms of the Creative Commons license.