Home

Awesome

Multi^2OIE: <u>Multi</u>lingual Open Information Extraction Based on <u>Multi</u>-Head Attention with BERT

Source code for learning Multi^2OIE for (multilingual) open information extraction.

Paper

Multi^2OIE: <u>Multi</u>lingual Open Information Extraction Based on <u>Multi</u>-Head Attention with BERT<br> Youngbin Ro, Yukyung Lee, and Pilsung Kang*<br> Accepted to Findings of ACL: EMNLP 2020. (*corresponding author)

<br>

Overview

What is Open Information Extraction (Open IE)?

Niklaus et al. (2018) describes Open IE as follows:

Information extraction (IE) <u>turns the unstructured information expressed in natural language text into a structured representation</u> in the form of relational tuples consisting of a set of arguments and a phrase denoting a semantic relation between them: <arg1; rel; arg2>. (...) Unlike traditional IE methods, Open IE is <u>not limited to a small set of target relations</u> known in advance, but rather extracts all types of relations found in a text.

openie_overview

Note

Our Approach

multi2oie_overview

Step 1: Extract predicates (relations) from the input sentence using BERT

Step 2: Extract arguments using multi-head attention blocks

Multilingual Extraction

<br>

Usage

Prerequisites

Environmental Setup

Install

using 'conda' command,
# this makes a new conda environment
conda env create -f environment.yml
conda activate multi2oie
using 'pip' command,
pip install -r requirements.txt

NLTK setup

python -c "import nltk; nltk.download('stopwords')"

Datasets

Dataset Released

Do-it-yourself

Original data file (bootstrapped sample from OpenIE4; used in SpanOIE) can be downloaded from here. Following download, put the downloaded data in './datasets' and use preprocess.py to convert the data into the format suitable for Multi^2OIE.

cd utils
python preprocess.py \
    --mode 'train' \
    --data '../datasets/structured_data.json' \
    --save_path '../datasets/openie4_train.pkl' \
    --bert_config 'bert-base-cased' \
    --max_len 64

For multilingual training data, set 'bert_config' as 'bert-base-multilingual-cased'.

Run the Code

Model Released

We used TITAN RTX GPU for training, and the use of other GPU can make the final performance different.

for training,
python main.py [--FLAGS]
for testing,
python test.py [--FLAGS]
<br>

Model Configurations

# of Parameters

Hyper-parameters {& searching bounds}

Expected Results

Development set

OIE2016

CaRB

Testing set

Re-OIE2016

CaRB

<br>

References