


Awesome License: MIT

Code and Data for the paper "LLMs for Knowledge Graph Construction and Reasoning: Recent Capabilities and Future Opportunities"


<div align=center><img src="figs/overall_f.jpg" alt="Overview" width="700px" /></div>

The overview of our work. There are three main components: 1) Basic Evaluation: detailing our assessment of large models (text-davinci-003, ChatGPT, and GPT-4), in both zero-shot and one-shot settings, using performance data from fully supervised state-of-the-art models as benchmarks; 2) Virtual Knowledge Extraction: an examination of large models' virtual knowledge capabilities on the constructed VINE dataset; and 3) Automatic KG: the proposal of utilizing multiple agents to facilitate the construction and reasoning of KGs.

🌟 Evaluation

Data Preprocess

The datasets that we used in our experiments are as follows:

The expected structure of files is:

 |-- KG Construction
 |    |-- DuIE2.0
 |    |    |-- datas                    #dataset
 |    |    |-- prompts                  #0-shot/1-shot prompts
 |    |    |-- duie_processor.py        #preprocess data
 |    |    |-- duie_prompts.py          #generate prompts
 |	  |--MAVEN
 |    |    |-- datas                    #dataset
 |    |    |-- prompts                  #0-shot/1-shot prompts
 |    |    |-- maven_processor.py       #preprocess data
 |    |    |-- maven_prompts.py         #generate prompts
 |    |--RE-TACRED
 |    |    |-- datas                    #dataset
 |    |    |-- prompts                  #0-shot/1-shot prompts
 |    |    |-- retacred_processor.py    #preprocess data
 |    |    |-- retacred_prompts.py      #generate prompts
 |    |--SciERC
 |    |    |-- datas                    #dataset
 |    |    |-- prompts                  #0-shot/1-shot prompts
 |    |    |-- scierc_processor.py      #preprocess data
 |    |    |-- scierc_prompts.py        #generate prompts
 |-- KG Reasoning (Link Prediction)
 |    |-- FB15k-237
 |    |    |-- data                     #sample data
 |    |    |-- prompts                  #0-shot/1-shot prompts
 |    |-- ATOMIC2020
 |    |    |-- data                     #sample data
 |    |    |-- prompts                  #0-shot/1-shot prompts
 |    |    |-- system_eval              #eval for ATOMIC2020

How to Run

🕵️Virtual Knowledge Extraction

The VINE dataset we built is available here.

Do the following code to generate prompts:

cd Virtual Knowledge Extraction
python VINE_processor.py
python VINE_prompts.py


Our AutoKG code is based on CAMEL: Communicative Agents for “Mind” Exploration of Large Scale Language Model Society and a LangChain implementation of the paper, you can get more details through this link.

Run the Autokg.py script.

cd AutoKG
python Autokg.py


If you use the code or data, please cite the following paper:

  title={LLMs for Knowledge Graph Construction and Reasoning: Recent Capabilities and Future Opportunities},
  author={Zhu, Yuqi and Wang, Xiaohan and Chen, Jing and Qiao, Shuofei and Ou, Yixin and Yao, Yunzhi and Deng, Shumin and Chen, Huajun and Zhang, Ningyu},
  journal={arXiv preprint arXiv:2305.13168},