Awesome
Second Order SDP
Second Order Parser for Semantic Dependency Parsing
This repo contains the code forked from Parser-v3 and used for the semantic dependency parser in Wang et al. (2019), Second-Order Semantic Dependency Parsing with End-to-End Neural Networks and CoNLL 2019 shared task (SDP part only).
News
The PyTorch Version for the Second-Order SDP parser is now available at MultilangStructureKD!
Requirements
python3
tensorflow-gpu>=1.12.0
How to use
Training
Our second order parser can be trained by simply running
python3 -u main.py train GraphParserNetwork --config_file config/sec_order.cfg --noscreen
This config file will run Mean Field Variational Inference for second order parts, and if you want to run with Loopy Belief Propagation, run
python3 -u main.py train GraphParserNetwork --config_file config/sec_order_LBP.cfg --noscreen
Training with Bert Embedding
Our second order parser can be trained with bert. First clone the bert repository:
git clone https://github.com/google-research/bert
Download bert model: BERT-Large, Uncased (Whole Word Masking)
To train with bert, simply run
python3 -u main.py train GraphParserNetwork --config_file config_gen/bert_large_glove_previous_layer_100linear_01lr_5decay_dm_switch_new1.cfg --noscreen
If you want to fine tune bert model, set is_training=True
in BertVocab
Parsing
A trained model can be run by calling
python3 main.py --save_dir $SAVEDIR run $DATADIR --output_dir results
The parsed result will be saved results/
directory. The $SAVEDIR
is the directory of the model, for example, if you trained with config/sec_order.cfg
, the model will be saved in saves/SemEval15/DM/MF_dm_3iter
. The $DATADIR
is the directory of the data in CONLLU
format.
Pretrained Model
The pretrained model on DM can be download from the following links:
Baidu Netdisk, Password: ecqe
The model is trained with Bert and Glove embeddings considering there are no golden POS tags and lemmas in practice, the Labeled F1 score is 94.25
and 90.76
for in-domain
and out-of-domain
respectively.
OOM issue
To avoid out of memory in training phase, our parser should be trained with 12GB gpu memory, and no longer than 60 words for each sentence. The number of iterations for mean field variational inference is at most 3 and at most 2 for loopy belief propagation in a 12GB Titan X gpu. If you have a larger gpu, such as Tesla P40 24GB, loopy belief propation can be also trained with 3 iterations. To set the number of iterations, set num_iteration
in SecondOrderGraphIndexVocab
or SecondOrderGraphLBPVocab
of the config file. Another way is reduce the training batch_size
in CoNLLUTrainset
of the config file.
Details
If you want to see some details of our parser, the source code for our parser is in parser/structs/vocabs/second_order_vocab.py
for Mean Field Variational Inference and second_order_LBP_vocab.py
for Loopy Belief Propagation in the same directory.
Cite
<!-- Detailed description will be updated soon. --> <!-- Our code for [CoNLL 2019 shared task](http://mrp.nlpl.eu/) (SDP part) will be updated soon. -->If you find our code is useful, please cite:
@inproceedings{wang-etal-2019-second,
title = "Second-Order Semantic Dependency Parsing with End-to-End Neural Networks",
author = "Wang, Xinyu and
Huang, Jingxian and
Tu, Kewei",
booktitle = "Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics",
month = jul,
year = "2019",
address = "Florence, Italy",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/P19-1454",
pages = "4609--4618",}
@inproceedings{Wan:Liu:Jia:19,
author = {Wang, Xinyu and Liu, Yixian and Jia, Zixia
and Jiang, Chengyue and Tu, Kewei},
title = {{ShanghaiTech} at {MRP}~2019:
{S}equence-to-Graph Transduction with Second-Order Edge Inference
for Cross-Framework Meaning Representation Parsing},
booktitle = CONLL:19:U,
address = L:CONLL:19,
pages = {\pages{--}{55}{65}},
year = 2019
}
Contact
If you have any questions, feel free to contact with me through email.