Awesome
Time Waits for No One
We provide the processed datasets for a subset of our 8 tasks. All released datasets are intended for non-commercial use.
We provide labels and tweet IDs and omit the content for the political affiliation task, in accordance with the Twitter License Agreement. These tweets were collected via [Twitter API for Academic Research] and is intended for non-commercial use.
For the twitter NER data, please see Shruti's Github. For reproducibility, we provide scripts for processing this data.
Media Frames Corpus
We provide scripts for processing the media frames corpus. Please see here.
Newsroom
For both newsroom summarization and publisher classification, we used the Newsroom dataset. We provide scripts for processing the data.
SciERC
We use the SciERC dataset. We also provide scripts for processing this data.
AI Publisher
We use data from the Semantic Scholar API, which is licensed under an ODC-BY. We release our data splits for this task.
Yelp.
Please see the yelp dataset.