Data and code for the ACL 2019 paper Multi-News: a Large-Scale Multi-Document Summarization Dataset and Abstractive Hierarchical Model.
Preprocessed, but not truncated, data </br> Preprocessed, truncated, data </br> Raw data (only replaced \n with "NEWLINE_CHAR" and appended "|||||" to the end of each story). </br> Raw data, bad retrievals removed -- Removes documents retrieved with error noticed in this issue and removes the "|||||" at the end of each example. </br> Raw data -- zipped </br> Tensorflow datasets
Models and Summaries
Trained models </br> Model output