Home

Awesome

MAVEN-dataset

Source code and dataset for EMNLP 2020 paper "MAVEN: A Massive General Domain Event Detection Dataset".

Data

The dataset (ver. 1.0) can be obtained from Tsinghua Cloud or Google Drive. The data format is introduced in this document.

We also release the document topics for data analysis and model development. The docid2topic.json is to map the document ids to their EventWiki topic labels.

CodaLab

To get the test results, you can submit your predictions to our permanent CodaLab competition (the older version will be phased out soon). For the evaluation method, please refer to the evaluation script.

Codes

We release the source codes for the baselines, including DMCNN, BiLSTM, BiLSTM+CRF, MOGANED and DMBERT.

Citation

If these data and codes help you, please cite this paper.

@inproceedings{wang2020MAVEN,
  title={{MAVEN}: A Massive General Domain Event Detection Dataset},
  author={Wang, Xiaozhi and Wang, Ziqi and Han, Xu and Jiang, Wangyi and Han, Rong and Liu, Zhiyuan and Li, Juanzi and Li, Peng and Lin, Yankai and Zhou, Jie},
  booktitle={Proceedings of EMNLP 2020},
  year={2020}
}