Home

Awesome

Sinhala-POS-Data

POS tagged Sinhala text

news- verified- final level.txt file contains the first version of our annotated data. There are 253636 word in it. TagList.txt contains the tag list. Tagging Guide.pdf contains a detailed description of the tags.

If you use this data set or the tag set, please cite one of these as apropriate:

Fernando, S., & Ranathunga, S. (2018, May). Evaluation of Different Classifiers for Sinhala POS Tagging. In 2018 Moratuwa Engineering Research Conference (MERCon) (pp. 96-101). IEEE.

Dilshani, N., Fernando, S., Ranathunga, S., Jayasena, S., & Dias, G. (2017). A Comprehensive Part of Speech (POS) Tag Set for Sinhala Language. The Third International Conference on Linguistics in Sri Lanka, ICLSL 2017. Department of Linguistics, University of Kelaniya, Sri Lanka.

Fernando, S., Ranathunga, S., Jayasena, S., & Dias, G. (2016, December). Comprehensive Part-Of-Speech Tag Set and SVM Based POS Tagger for Sinhala. In Proceedings of the 6th Workshop on South and Southeast Asian Natural Language Processing (WSSANLP2016) (pp. 173-182).