Home

Awesome

Modeling Fine-Grained Entity Types with Box Embeddings

Modeling Fine-Grained Entity Types with Box Embeddings<br/> Yasumasa Onoe, Michael Boratko, Andrew McCallum, Greg Durrett<br/> ACL 2021

@inproceedings{onoe2021boxet,
 title={Modeling Fine-Grained Entity Types with Box Embeddings},
 author={Yasumasa Onoe, Michael Boratko, Andrew McCallum, Greg Durrett},
 booktitle={ACL},
 year={2021}
}

Getting Started

Dependencies

$ git clone https://github.com/yasumasaonoe/Box4Types.git

This code has been tested with Python 3.7 and the following dependencies:

If you're using a conda environment, please use the following commands:

$ conda create -n box4et python=3.7
$ conda activate box4et
$ pip install  [package name]

File Descriptions

Datasets / Models

This code assumes 3 directories listed below. Paths to these directories are specified in box4et/constant.py.

Run this to download these folders.

$ bash download_data.sh

The data files are formatted as jsonlines. Here is an example from UFET:

{
    "ex_id": "dev_190", 
    "right_context": ["."], 
    "left_context": ["For", "this", "handpicked", "group", "of", "jewelry", "savvy", "Etsy", "artisans", ",", "their", "passion", "is", "The", "Hunger", "Games", ",", "the", "first", "of", "3", "best", "selling", "young", "adult", "books", "by"], 
    "right_context_text": ".", 
    "left_context_text": "For this handpicked group of jewelry savvy Etsy artisans , their passion is The Hunger Games , the first of 3 best selling young adult books by",
    "y_category": ["name", "person", "writer", "author"],
    "word": "Suzanne Collins", 
    "mention_as_list": ["Suzanne", "Collins"]
}

FieldDescription
ex_idUnique example ID.
right_contextTokenized right context of a mention.
left_contextTokenized left context of a mention.
wordA mention.
right_context_textRight context of a mention.
left_context_textLeft context of a mention.
y_categoryThe gold entity types derived from Wikipedia categories.
y_titleWikipedia title of the gold Wiki entity.
mention_as_listA tokenized mention.

Entity Typing Training and Evaluation

Training

main.py is the primary script for training and evaluating models. See box4et/train_*.sh.

$ cd box4et
$ bash train_box.sh

Evaluation

If you would like to evaluate the trained model on another dataset, simply set --mode to test and point to the test data using --eval_data. Make sure put -load so that the trained model will be loaded. See box4et/eval_*.sh.

$ cd box4et
$ bash eval_box.sh