Awesome
README
The code and data for "Open Relation Modeling: Learning to Define Relations between Entities" (Findings of ACL '22)
Introduction
We study Open Relation Modeling: given two entities, generating a coherent sentence describing the relation between them.
E.g., (data mining, database) => data mining is a process of extracting and discovering patterns in large data sets involving methods at the intersection of machine learning, statistics, and database systems.
Requirements
See requirements.txt
Data
Data are available on this link
Train
Note: you may download the best trained model, i.e., RelationBART-MP (Large) from Google Drive
Download pre-trained BART
Download pre-trained bart.large
wget https://dl.fbaipublicfiles.com/fairseq/models/bart.large.tar.gz
Unzip files to bart/bart.large/
Preprocess data
bash preprocess.sh
Train model
bash train.sh
Generation
Generate relation descriptions for entity pairs in test set
fairseq-generate input/k_path_large-bin/ --path tmp/k_path_large/checkpoint_best.pt --beam 5 --batch-size 128 --remove-bpe --no-repeat-ngram-size=5 --min-len=5 --max-len-b 100 --bpe gpt2 --gpt2-encoder-json encoder.json --gpt2-vocab-bpe vocab.bpe --scoring sacrebleu | tee output/k_path_large-epochbest-k5.out
Evaluation
output/k_path_large-epochbest-k5.out
contains the predicted relation descriptions and confidence scores
Extract the output (for models with reasoning path selection)
python extract_output_for_path_k.py
or (for models without reasoning path selection)
grep ^D <out> | cut -f3- > <sys>
grep ^T <out> | cut -f2- > <ref>
Evaluation
Note: You need to install the required packages in the script before evaluation
bash RM-scorer.sh output/k_path_large-epochbest-k5.out.sys output/k_path_large-epochbest-k5.out.ref
Interactive
Run interactive mode
fairseq-interactive input/k_path_large-bin --path tmp/k_path_large/checkpoint_best.pt --bpe gpt2 --source-lang src --target-lang tgt --no-repeat-ngram-size 5 --beam 5 --nbest 20 --gpt2-encoder-json encoder.json --gpt2-vocab-bpe vocab.bpe
Examples
evaluation; unknown: machine learning
Output: In computer science, evaluation is the process of evaluating a machine learning algorithm to determine whether the algorithm is performing well.
data mining; facet of: machine learning; subclass of: artificial intelligence
Output: Data mining is a subfield of machine learning and artificial intelligence concerned with the collection, processing, and analysis of large amounts of data.
Citation
The details of this repo are described in the following paper. If you find this repo useful, please kindly cite it:
@inproceedings{huang2022open,
title={Open Relation Modeling: Learning to Define Relations between Entities},
author={Huang, Jie and Chang, Kevin Chen-Chuan and Xiong, Jinjun and Hwu, Wen-mei},
booktitle={Findings of the Association for Computational Linguistics: ACL 2022},
year={2022}
}