Home

Awesome

Language Models as Knowledge Embeddings

Source code for the paper Language Models as Knowledge Embeddings

Notice

[June 2023] We recently identified a data leakage issue in our code that, during prediction, we inadvertently leaked degree information about the entities to be predicted. This unintentionally provided a shortcut for the model, which affected the experimental results to some extent. We have fixed this issue and re-conducted our experiments, and updated the paper accordingly. The revised results do not impact the majority of the paper's conclusions and contributions. The method continues to achieve state-of-the-art (SOTA) performance on the WN18RR, FB13, and WN11 datasets, compared to previous works. However, on the FB15k-237 dataset, the model's performance has declined to a certain extent and underperforms state-of-the-art structured-based methods. We sincerely apologize for this error.

Updated Results:

WN18RR

MethodsMRMRRHits@1Hits@3Hits@10
TransE23000.2430.0430.4410.532
DistMult51100.4300.3900.4400.490
ComplEx52610.4400.4100.4600.510
RotatE33400.4760.4280.4920.571
TuckER-0.4700.4430.4820.526
HAKE-0.4970.4520.5160.582
CoKE-0.4840.4500.4960.553
--------------------------------------------------------------------------
Pretrain-KGE_TransE17470.235--0.557
KG-BERT970.2160.0410.3020.524
StAR_BERT-base990.3640.2220.4360.647
MEM-KGC_BERT-base_(w/o EP)-0.5330.4730.5700.636
MEM-KGC_BERT-base_(w/ EP)-0.5570.4750.6040.704
C-LMKE_BERT-base790.6190.5230.6710.789

FB15k-237

MethodsMRMRRHits@1Hits@3Hits@10
TransE3230.2790.1980.3760.441
DistMult2540.2410.1550.2630.419
ComplEx3390.2470.1580.2750.428
RotatE1770.3380.2410.3750.533
TuckER-0.3580.2660.3940.544
HAKE-0.3460.2500.3810.542
CoKE-0.3640.2720.4000.549
--------------------------------------------------------------------------
Pretrain-KGE_TransE1620.332--0.529
KG-BERT153---0.420
StAR_BERT-base1360.2630.1710.2870.452
MEM-KGC_BERT-base_(w/o EP)-0.3390.2490.3720.522
MEM-KGC_BERT-base_(w/ EP)-0.3460.2530.3810.531
C-LMKE_BERT-base1410.3060.2180.3310.484

Requirements

Usage

Run main.py to train or test our models.

An example for training for triple classification:

python main.py --batch_size 16 --plm bert --data wn18rr --task TC

An example for training for link prediction:

python main.py --batch_size 16 --plm bert --contrastive --self_adversarial --data wn18rr --task LP 

The arguments are as following:

Datasets

The datasets are put in the folder 'data', including fb15k-237, WN18RR, FB13 and umls.