Home

Awesome

    Annotated Fake News Dataset in Urdu and Augmentation using Machine Translation
                          ===========================

                            March 03, 2020
                            
                       Maaz Amjad, Grigori Sidorov, Alisa Zhila

                   Natural Language and Text Processing Laboratory
                   Center for Computing Research (CIC)
                   Instituto Politécnico Nacional (IPN)
                   Ciudad de México (Mexico City), Mexico  

CONTENTS

  1. Introduction
  2. Feedback
  3. Citation Info
  4. Acknowledgments

1. Introduction

This dataset accompanies paper by Amjad, M., Sidorov, G., Zhila, A. Data Augmentation using Machine Translation for Fake News Detection in the Urdu Language (2020), LREC 2020 (accepted).

This is a language resource which contains a dataset of 900 news articles originally in Urdu annotated as real or fake. Additionally, it contains a 400 news article as an augmentation dataset generated using Google Translate MT system from English to Urdu, as well as a number of combinations of these datasets for exploration of the augmentation effect. The original English Fake News dataset is available from https://web.eecs.umich.edu/~mihalcea/downloads.html#FakeNews.


2. Feedback

If you want to know how this dataset was build (include the explanation of crawling and annotation technique) and how we did our experiments for Fake News detection in Urdu language using this dataset, you can read our paper in here:

For further questions or inquiries about this dataset, you can contact Maaz Amjad (maazamjad@phystech.edu)


3. Citation Info

This dataset and the other resource can be used for free, but if you want to publish paper/publication using this dataset, please cite this publication:

@article{Maazaug2020,
author = {Maaz Amjad, Grigori Sidorov, Alisa Zhila},
title = {Annotated Fake News Dataset in Urdu and Augmentation using Machine Translation},
conference = {Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020)},
page = {2530–2535}
year = {2020}
}

4. Acknowledgments

The work was done with partial support of CONACYT project 240844 and SIP-IPN projects 20195719.