Awesome

ShortSinhalaSentences

This is the dataset used in the paper Kadupitiya, J.C.S., Ranathunga, S. and Dias, G., 2016, December. Sinhala Short Sentence Similarity Measures using Corpus-Based Simi-larity for Short Answer Grading. In 6th Workshop on South and Southeast Asian Natural Language Processing (pp. 44-53). The data set contains Sinhala short sentences generated from a flicker image data set (refer to papr for more detais). participants were asked to produce captions for 500 images. Then the similarity between these sentence pairs were manually determined, which was used as the gold data set to validate the algorithms. The code that uses this dataset to measure short sentence similarity: https://github.com/suralk/SinhalaSentenceSimilarityMeasurement