Awesome

Differentiating Concepts and Instances for Knowledge Graph Embedding

Source code and datasets of EMNLP2018 paper: "Differentiating Concepts and Instances for Knowledge Graph Embedding". https://arxiv.org/abs/1811.04588

News

Thanks to Shengding Hu for providing the PyTorch version of the training code.

Dataset

We provide YAGO39K and M-YAGO39K dataset we used in this work in data/YAGO39K directory. The meanings of these files are explained as follows:

data/YAGO39K/Train

instance2id.txt, concept2id.txt, relation2id.txt: the id of every instance, concept and relation in this dataset.
triple2id.txt: normal triples represented as ids with format [head_instance_id tail_instance_id relation_id].
instanceOf2id.txt: instanceOf triples represented as ids with format [instance_id concept_id].
subClassOf2id.txt: subClassOf triples represented as ids with format [sub_concept_id concept_id].

data/YAGO39K/Valid

triple2id_positive.txt, instanceOf2id_positive.txt, subClassOf2id_positive.txt: positive triples from YAGO dataset. Their formats are the same as those in data/YAGO39K/Train.
triple2id_negative.txt, instanceOf2id_negative.txt, subClassOf2id_negative.txt: negative triples generated by us, which will be used in Triple Classification experiments.

data/YAGO39K/Test

the test dataset, which is similar to data/YAGO39K/Valid.

data/YAGO39K/M-Valid

the valid dataset for M-YAGO39K. The formats of all files are the same as those in data/YAGO39K/Valid.

data/YAGO39K/M-Test

the test dataset for M-YAGO39K. The formats of all files are the same as those in data/YAGO39K/Test.

Codes

The source codes of our work are put in the folder src/.

Compile

cd src
make

Train

cd src
./transC

You can add the following optional parameters when running transC:

-dim: the vector size.
-margin: the margin for normal triples.
-margin_ins: the margin for instanceOf triples.
-margin_sub: the margin for subClassOf triples.
-data: the dataset that you want to use. You can only choose YAGO39K in current vision.
-l1flag: it can be 1 or 0, 1 means using L1 loss and 0 means L2 loss.
-bern: it can be 1 or 0, 1 means using "bern" strategy and 0 means using "unif" strategy.

The Vectors of entities, concepts and relations will be stored in the folder vector/ after training.

Python Code

There is also python code for the training part, in py_version/ The python version can be run with

python -m py_version.transc

The outputs are stored in the same format and location with cpp version.

Experiment

Link Precidtion

For Link Prediction experiment, you need to

cd src
./test_link_prediction

You can add the following optional parameters when running test_link_prediction:

-dim: the vector size.
-data: the dataset that you want to use. You can only choose YAGO39K in current vision.
-threads: how many threads you want to use.

Normal Triple Classification

For normal Triple Classification experiment, you need to

cd src
./test_classification_normal

You can add the following optional parameters when running test_triple_classification_normal:

-dim: the vector size.
-data: the dataset that you want to use. You can only choose YAGO39K in current vision.

InstanceOf and SubClassOf Triple Classification

For instanceOf and subClassOf Triple Classification experiments, you need to

cd src
./test_classification_isA

You can add the following optional parameters when running test_triple_classification_isA:

-dim: the vector size.
-data: the dataset that you want to use. You can only choose YAGO39K in current vision.
-mix: it can be 1 or 0, 1 means doing this experiment in M-YAGO39K and 0 means doing this experiment in YAGO39K.

Cite

If you use the code, please cite this paper:

Xin Lv, Lei Hou, Juanzi Li, Zhiyuan Liu. Differentiating Concepts and Instances for Knowledge Graph Embedding. *The Conference on Empirical Methods in Natural Language Processing (EMNLP 2018)*L.