Awesome
OntoED and OntoEvent
<p align="center"> <font size=4><strong>OntoED: A Model for Low-resource Event Detection with Ontology Embedding</strong></font> </p>π The project is an official implementation for OntoED model and a repository for OntoEvent dataset, which has firstly been proposed in the paper OntoED: Low-resource Event Detection with Ontology Embedding accepted by ACL 2021.
π€ The implementations are based on Huggingface's Transformers and remanagement is referred to MAVEN's baselines & DeepKE.
π€ We also provide some baseline implementations for reproduction.
Brief Introduction
OntoED is a model that resolves event detection under low-resource conditions. It models the relationship between event types through ontology embedding: it can transfer knowledge of high-resource event types to low-resource ones, and the unseen event type can establish connection with seen ones via event ontology.
Project Structure
The structure of data and code is as follows:
Reasoning_In_EE
βββ README.md
βββ OntoED # model
βΒ Β βββ README.md
βΒ Β βββ data_utils.py # for data processing
βΒ Β βββ ontoed.py # main model
βΒ Β βββ run_ontoed.py # for model running
βΒ Β βββ run_ontoed.sh # bash file for model running
βββ OntoEvent # data
βΒ Β βββ README.md
βΒ Β βββ __init__.py
βΒ Β βββ event_dict_data_on_doc.json.zip # raw full ED data
βΒ Β βββ event_dict_train_data.json # ED data for training
βΒ Β βββ event_dict_test_data.json # ED data for testing
βΒ Β βββ event_dict_valid_data.json # ED data for validation
βΒ Β βββ event_relation.json # event-event relation data
βββ baselines # baseline models
βββ DMCNN
βΒ Β βββ README.md
βΒ Β βββ convert.py # for data processing
βΒ Β βββ data # data
βΒ Β βΒ Β βββ labels.json
βΒ Β βββ dmcnn.config # configure training & testing
βΒ Β βββ eval.sh # bash file for model evaluation
βΒ Β βββ formatter
βΒ Β βΒ Β βββ DmcnnFormatter.py # runtime data processing
βΒ Β βΒ Β βββ __init__.py
βΒ Β βββ main.py # project entrance
βΒ Β βββ model
βΒ Β βΒ Β βββ Dmcnn.py # main model
βΒ Β βΒ Β βββ __init__.py
βΒ Β βββ raw
βΒ Β βΒ Β βββ 100.utf8 # word vector
βΒ Β βββ reader
βΒ Β βΒ Β βββ MavenReader.py # runtime data reader
βΒ Β βΒ Β βββ __init__.py
βΒ Β βββ requirements.txt # requirements
βΒ Β βββ train.sh # bash file for model training
βΒ Β βββ utils
βΒ Β βββ __init__.py
βΒ Β βββ configparser_hook.py
βΒ Β βββ evaluation.py
βΒ Β βββ global_variables.py
βΒ Β βββ initializer.py
βΒ Β βββ runner.py
βββ JMEE
βΒ Β βββ README.md
βΒ Β βββ data # to store data file
βΒ Β βββ enet
βΒ Β βΒ Β βββ __init__.py
βΒ Β βΒ Β βββ consts.py # configurable parameters
βΒ Β βΒ Β βββ corpus
βΒ Β βΒ Β βΒ Β βββ Corpus.py # dataset class
βΒ Β βΒ Β βΒ Β βββ Data.py
βΒ Β βΒ Β βΒ Β βββ Sentence.py
βΒ Β βΒ Β βΒ Β βββ __init__.py
βΒ Β βΒ Β βββ models # modules of JMEE
βΒ Β βΒ Β βΒ Β βββ DynamicLSTM.py
βΒ Β βΒ Β βΒ Β βββ EmbeddingLayer.py
βΒ Β βΒ Β βΒ Β βββ GCN.py
βΒ Β βΒ Β βΒ Β βββ HighWay.py
βΒ Β βΒ Β βΒ Β βββ SelfAttention.py
βΒ Β βΒ Β βΒ Β βββ __init__.py
βΒ Β βΒ Β βΒ Β βββ ee.py
βΒ Β βΒ Β βΒ Β βββ model.py # main model
βΒ Β βΒ Β βββ run
βΒ Β βΒ Β βΒ Β βββ __init__.py
βΒ Β βΒ Β βΒ Β βββ ee
βΒ Β βΒ Β βΒ Β βββ __init__.py
βΒ Β βΒ Β βΒ Β βββ runner.py # runner class
βΒ Β βΒ Β βββ testing.py # evaluation
βΒ Β βΒ Β βββ training.py # training
βΒ Β βΒ Β βββ util.py
βΒ Β βββ eval.sh # bash file for model evaluation
βΒ Β βββ requirements.txt # requirements
βΒ Β βββ train.sh # bash file for model training
βββ README.md
βββ eq1.png
βββ eq2.png
βββ jointEE-NN
βΒ Β βββ README.md
βΒ Β βββ data
βΒ Β βΒ Β βββ fistDoc.nnData4.txt # data format sample
βΒ Β βββ evaluateJEE.py # model evaluation
βΒ Β βββ jeeModels.py # main model
βΒ Β βββ jee_processData.py # data process
βΒ Β βββ jointEE.py # project entrance
βββ stanford.zip # cleaned dataset for baseline models
Requirements
-
python==3.6.9
-
torch==1.8.0 (lower may also be OK)
-
transformers==2.8.0
-
sklearn==0.20.2
Usage
1. Project PreparationοΌDownload this project and unzip the dataset. You can directly download the archive, or run git clone https://github.com/231sm/Reasoning_In_EE.git
at your teminal.
cd [LOCAL_PROJECT_PATH]
git clone https://github.com/231sm/Reasoning_In_EE.git
2. Running Preparation: Adjust the parameters in run_ontoed.sh
bash file, and input the true path of 'LABEL_PATH' and 'RELATION_PATH' at the end of data_utils.py
.
cd Reasoning_In_EE/OntoED
vim run_ontoed.sh
(input the parameters, save and quit)
vim data_utils.py
(input the path of 'LABEL_PATH' and 'RELATION_PATH', save and quit)
Hint:
- Please refer to
main()
function inrun_ontoed.py
file for detail meanings of each parameters. - 'LABEL_PATH' and 'RELATION_PATH' means the path for event_dict_train_data.json and event_relation.json respectively.
3. Running Model: Run ./run_ontoed.sh
for training, validation, and testing.
A folder with configuration, models weights, and results (in is_test_true_eval_results.txt
) will be saved at the path you input ('--output_dir') in the bash file run_ontoed.sh
.
cd Reasoning_In_EE/OntoED
./run_ontoed.sh
('--do_train', '--do_eval', '--evaluate_during_training', '--do_test' is necessarily input in 'run_ontoed.sh')
Or you can run run_ontoed.py with manual parameter input (parameters can be copied from 'run_ontoed.sh')
python run_ontoed.py --para...
How about the Dataset
OntoEvent is proposed for ED and also annotated with correlations among events. It contains 13 supertypes with 100 subtypes, derived from 4,115 documents with 60,546 event instances. Please refer to OntoEvent for details.
Statistics
The statistics of OntoEvent are shown below, and the detailed data schema can be referred to our paper.
Dataset | #Doc | #Instance | #SuperType | #SubType | #EventCorrelation |
---|---|---|---|---|---|
ACE 2005 | 599 | 4,090 | 8 | 33 | None |
TAC KBP 2017 | 167 | 4,839 | 8 | 18 | None |
FewEvent | - | 70,852 | 19 | 100 | None |
MAVEN | 4,480 | 111,611 | 21 | 168 | None |
OntoEvent | 4,115 | 60,546 | 13 | 100 | 3,804 |
Data Format
The OntoEvent dataset is stored in json format.
πFor each event instance in event_dict_data_on_doc.json
, the data format is as below:
{
'doc_id': '...',
'doc_title': 'XXX',
'sent_id': ,
'event_mention': '......',
'event_mention_tokens': ['.', '.', '.', '.', '.', '.'],
'trigger': '...',
'trigger_pos': [, ],
'event_type': ''
}
πFor each event relation in event_relation.json
, we list the event instance pair, and the data format is as below:
'EVENT_RELATION_1': [
[
{
'doc_id': '...',
'doc_title': 'XXX',
'sent_id': ,
'event_mention': '......',
'event_mention_tokens': ['.', '.', '.', '.', '.', '.'],
'trigger': '...',
'trigger_pos': [, ],
'event_type': ''
},
{
'doc_id': '...',
'doc_title': 'XXX',
'sent_id': ,
'event_mention': '......',
'event_mention_tokens': ['.', '.', '.', '.', '.', '.'],
'trigger': '...',
'trigger_pos': [, ],
'event_type': ''
}
],
...
]
πEspecially for "COSUPER", "SUBSUPER" and "SUPERSUB", we list the event type pair, and the data format is as below:
"COSUPER": [
["Conflict.Attack", "Conflict.Protest"],
["Conflict.Attack", "Conflict.Sending"],
...
]
How to Cite
π Thank you very much for your interest in our work. If you use or extend our work, please cite the following paper:
@inproceedings{ACL2021_OntoED,
title = "{O}nto{ED}: Low-resource Event Detection with Ontology Embedding",
author = "Deng, Shumin and
Zhang, Ningyu and
Li, Luoqiu and
Hui, Chen and
Huaixiao, Tou and
Chen, Mosha and
Huang, Fei and
Chen, Huajun",
booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)",
month = aug,
year = "2021",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2021.acl-long.220",
doi = "10.18653/v1/2021.acl-long.220",
pages = "2828--2839"
}