Awesome

Falsesum: Generating Document-level NLI Examples for Recognizing Factual Inconsistency in Summarization

Authors: Prasetya Ajie Utama, Joshua Bambrick, Nafise Sadat Moosavi, and Iryna Gurevych.

Purpose

This repository contains code to derive the data generated by the pipeline described in the paper <a href="https://arxiv.org/abs/2205.06009">Falsesum: Generating Document-level NLI Examples for Recognizing Factual Inconsistency in Summarization</a> to appear in the 2022 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2022). This code is not intended to be modified or reused.

Usage

The code in this repository uses Python 3.

To derive the dataset:

Download the CNN Stories and the Daily Mail Stories from https://cs.nyu.edu/~kcho/DMQA/
Unpack the downloaded files into a new directory
Run generate_falsesum_data.py to generate the dataset

Example execution:

python generate_falsesum_data.py <dir-with-falsesum-jsonl-data> <dir-with-unpacked-cnndm-data> <target-output-dir>

Citation

If you find this work useful, please consider citing our paper as:

@inproceedings{utama-etal-2022-falsesum,
  author    = {Utama, Prasetya Ajie and Bambrick, Joshua and Moosavi, Nafise Sadat and Gurevych, Iryna},
  title     = {Falsesum: Generating Document-level NLI Examples for Recognizing Factual Inconsistency in Summarization},
  booktitle = {Proceedings of the 2022 Annual Conference of the North American Chapter of the Association for Computational Linguistics},
  month     = jul,
  year      = {2022},
  publisher = {Association for Computational Linguistics}
}

License

Please read the LICENSE file.