Awesome
Falsesum: Generating Document-level NLI Examples for Recognizing Factual Inconsistency in Summarization
Authors: Prasetya Ajie Utama, Joshua Bambrick, Nafise Sadat Moosavi, and Iryna Gurevych.
Purpose
This repository contains code to derive the data generated by the pipeline described in the paper <a href="https://arxiv.org/abs/2205.06009">Falsesum: Generating Document-level NLI Examples for Recognizing Factual Inconsistency in Summarization</a> to appear in the 2022 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2022). This code is not intended to be modified or reused.
Usage
The code in this repository uses Python 3.
To derive the dataset:
- Download the CNN Stories and the Daily Mail Stories from https://cs.nyu.edu/~kcho/DMQA/
- Unpack the downloaded files into a new directory
- Run
generate_falsesum_data.py
to generate the dataset
Example execution:
python generate_falsesum_data.py <dir-with-falsesum-jsonl-data> <dir-with-unpacked-cnndm-data> <target-output-dir>
Citation
If you find this work useful, please consider citing our paper as:
@inproceedings{utama-etal-2022-falsesum,
author = {Utama, Prasetya Ajie and Bambrick, Joshua and Moosavi, Nafise Sadat and Gurevych, Iryna},
title = {Falsesum: Generating Document-level NLI Examples for Recognizing Factual Inconsistency in Summarization},
booktitle = {Proceedings of the 2022 Annual Conference of the North American Chapter of the Association for Computational Linguistics},
month = jul,
year = {2022},
publisher = {Association for Computational Linguistics}
}
License
Please read the LICENSE file.