Awesome

REDS2: Relation Extraction with 2-hop DS.

Code and dataset for our paper "Leveraging 2-hop Distant Supervision from Table Entity Pairs for Relation Extraction" (EMNLP'19). Please kindly cite the following paper if you find this repo useful.

Dataset

Source Data

NYT10: originally released by Riedel et al. (2010). We use the processed version from OpenNRE.
WikiTable: released by Bha-gavatula et al. (2015)

Data Format

We follow the data format of OpenNRE, with minor modification for training/testing data under our setting.

Training & Testing Data

[
    {
        'sentence': 'Bill Gates is the founder of Microsoft .',
        'head': {'word': 'Bill Gates', 'id': 'm.03_3d', ...(other information)},
        'tail': {'word': 'Microsoft', 'id': 'm.07dfk', ...(other information)},
        'relation': 'founder',
        'is_extend': 1
    },
    ...
]

Word Embedding Data

[
    {'word': 'the', 'vec': [0.418, 0.24968, ...]},
    {'word': ',', 'vec': [0.013441, 0.23682, ...]},
    ...
]

Relation Mapping Data

{
    'NA': 0,
    'relation_1': 1,
    'relation_2': 2,
    ...
}

You can download the processed data used in the paper from Box

Software

This codebase is developped based on pytorch-template, and adaptes some implementation from OpenNRE.

Requirements

Python 3.6
PyTorch 1.1.0

You can also use the docker image from DockerHub

Installation and Quick Start

Clone the repository
Install all the required package or use the docker image above
Prepare data, config, trained model. The configs used in the paper is already included. You can get the trained model from Box. The final structure should look like this:

REDS2
|-- ... 
|-- data
|   |-- {DATASET_NAME_1}
|       |-- train.json
|       |-- test.json
|       |-- word_vec.json
|       |-- rel2id.json
|
|-- config
|   |-- config_name.conf
|
|-- saved
    |-- models
        |-- {MODEL_NAME_1}
            |-- {TRAINING_ID}
                |-- model.pth
                |-- config.json

run the command bellow to train a BASE model from scratch

python train.py -d {GPU_ID} -c {PATH_TO_CONFIG}

run the command bellow to train REDS2 based on pretrained BASE model

python train_finetune.py -d {GPU_ID} -c {PATH_TO_CONFIG} -p {PATH_TO_TRAINED_BASE_MODEL}

run the commad bellow for evaluation. If you want to test BASE+MERGE, choose trained BASE model, then pass -m 1 to test.py. This will merge 2-hop sentences.

python test.py -r {PATH_TO_MODEL}

Add New Models

All the models are defined in model/model.py. There are several sub-modules defined in model/embedding.py, model/encoder.py and model/selector.py. You can use them or add your own.

Configs

The config file is a json file which contain all the parameters used in the experiment. The configs used in the paper are included in the repository.

Data Loader

Data loader is defined in data_loader/data_loaders.py. Use the method argument to choose 1-hop sentence bag, merged 1-hop & 2-hop or seperated 1-hop & 2-hop.