Awesome
Malayalam-Newspaper-Article-Dataset
Project scraped articles from a malayalam newspaper(janmabhumi) website to create a corpus of news articles. Also a set of queries is created and corresponding ground truth answers is retrieved by a combination of bm25 method and tf-idf method. The dataset can be useful for creating tools like stemmer, stopwords removal, lemmatizers, etc...
Dataset includes news articles from the year 2014 to 2018
##Note
This repo is obsolete, and scrapping does not work on the mentioned site.
DATASET
Directly download the complete dataset from Dropbox
Email : abhishekvalsan.iitk@gmail.com
Related Works
A similar repo with Telugu DataSet can be found here.