Home

Awesome

Malayalam-Newspaper-Article-Dataset

Project scraped articles from a malayalam newspaper(janmabhumi) website to create a corpus of news articles. Also a set of queries is created and corresponding ground truth answers is retrieved by a combination of bm25 method and tf-idf method. The dataset can be useful for creating tools like stemmer, stopwords removal, lemmatizers, etc...

Dataset includes news articles from the year 2014 to 2018

##Note

This repo is obsolete, and scrapping does not work on the mentioned site.

DATASET

Directly download the complete dataset from Dropbox

Email : abhishekvalsan.iitk@gmail.com

Related Works

A similar repo with Telugu DataSet can be found here.