Awesome
KODOLI
KODOLI is a novel KOrean Dataset for Offensive Language Identification.
Warning: it contains highly offensive expressions.
- KODOLI comprises more fine-grained offensiveness categories (i.e., not offensive, likely offensive, and offensive)
- A likely offensive language refers to
texts with implicit offensiveness or abusive language without offensive intentions
. - In addition, we propose two auxiliary tasks to help identify offensive languages: abusive language detection and sentiment analysis.
- You could utilize toxic detection through the auxiliary task. (Be careful the raw expressions)
Download
You can download benchmark KODOLI in this repository. Please, follow the data's license.
Dataset Description
Source
- Texts are mainly collected and sampled from online communities and news articles.
Statistics
Guideline Details
[Guideline(KOR.)] Comming Soon
Updates
- Apr 20, 2023 We release 3.6k examples for
offensive language identification
task
Citation
@inproceedings{park2023feel,
title={“Why do I feel offended?”-Korean Dataset for Offensive Language Identification},
author={Park, San-Hee and Kim, Kang-Min and Lee, O-joun and Kang, Youjin and Lee, Jaewon and Lee, Su-min and Lee, Sangkeun},
booktitle={Findings of the Association for Computational Linguistics: EACL 2023},
pages={1112--1123},
year={2023}
}
Contributors
License
<a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-sa/4.0/88x31.png" /></a><br />
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.