Home

Awesome

JAPE

Source code and datasets for ISWC2017 research paper "Cross-lingual entity alignment via joint attribute-preserving embedding", a.k.a., JAPE.

Code

The correspondence between python files and our JAPE variants is as follows:

To run SE, please use:
<code> python3 se_pos.py ../data/dbp15k/zh_en/ 0.3 </code>

To learn attribute embeddings, please use:
<code> python3 attr2vec.py ../data/dbp15k/zh_en/ ../data/dbp15k/zh_en/0_3/ ../data/dbp15k/zh_en/all_attrs_range ../data/dbp15k/en_all_attrs_range </code>

To calculate entity similarities, please use:
<code> python3 ent2vec_sparse.py ../data/dbp15k/zh_en/ 0.3 0.95 0.95 0.9 </code>

Dependencies

Datasets

In our experiment, we do not use all the triples in datasets. For relationship triples, we select a portion whose head and tail entities are popular. For attribute triples, we discard their values due to diversity and cross-linguality.

The whole datasets can be found in the website or Google Drive.

Directory structure

Take DBP15K (ZH-EN) as an example, the folder "zh_en" contains:

On top of this, we built 5 datasets (0_1, 0_2, 0_3, 0_4, 0_5) for embedding-based entity alignment models. "0_x" means that this dataset uses "x0%" entity links as training data and uses the rest for testing. The two entities of each entity link in training data have the same id. In our main experiments, we used the dataset in "0_3" which has 30% entity links as training data.

The folder "mtranse" contains the corresponding 5 datasets for MTransE. The difference lies in that the two entities of each entity link in training data have different ids.

Dataset files

Take the dataset "0_3" of DBP15K (ZH-EN) as an example, the folder "0_3" contains:

Running and parameters

Due to the instability of embedding-based methods, it is acceptable that the results fluctuate a little bit (±1%) when running code repeatedly.

If you have any difficulty or question in running code and reproducing expriment results, please email to zqsun.nju@gmail.com and whu@nju.edu.cn.

Citation

If you use this model or code, please cite it as follows:

@inproceedings{JAPE,
  author    = {Zequn Sun and Wei Hu and Chengkai Li},
  title     = {Cross-Lingual Entity Alignment via Joint Attribute-Preserving Embedding},
  booktitle = {ISWC},
  pages     = {628--644},
  year      = {2017}
}

Links

The following links point to some recent work that uses our datasets: