Home

Awesome

Introduction

Source code for GraphText: Graph Reasoning in Text Space.

Steps to reproduce

Python Environment

pip install -r requirements.txt

Key Hyper-parameters

Given an ego-graph, GraphText extracts text information (attributes) and relation information to construct a tree.

The node text attributes, denoted as text_info, is a set of attributes derived from the (ego)graph, the valid items to compose the set are:

The relations, denoted as rel_info, is a set of attributes derived from the (ego)graph, the valid items to compose the set are:

Commands

Setup OPENAI-API-Key

Make sure to set the openai api key to environment variable before running ICL experiments. You can set it up by export OPENAI_API_KEY="YourOwnAPIKey", or changing the configs/main.yaml for convenience:

env:
  vars:
    openai_api_key: ${oc.env:OPENAI_API_KEY,YourAPIKey} # Overwrite this to your API key

In-context Learning

Original Split

export OPENAI_API_KEY="YourOwnAPIKey"
cd src/scripts
python run_icl.py data=cora text_info=a2y_t.a3y_t rel_info=spd0.ppr.a2x_sim.a3x_sim 
python run_icl.py data=citeseer text_info=a3y_t.a0x_t rel_info=spd0.spd2.ppr.a2x_sim 
python run_icl.py data=texas text_info=a2y_t.a3y_t rel_info=spd2 
python run_icl.py data=wisconsin text_info=choice.a0x_t rel_info=a0x_sim.spd3
python run_icl.py data=cornell text_info=a1y_t.a4y_t rel_info=spd1.a3x_sim

Few-Shot Node Classification

export OPENAI_API_KEY="YourOwnAPIKey"
cd src/scripts
python run_icl.py data=citeseer data.n_shots=1 text_info=a0x_t.a3y_t rel_info=spd0.spd3
python run_icl.py data=citeseer data.n_shots=3 text_info=a0x_t.a3y_t rel_info=spd0.spd3.a2x_sim.a3x_sim
python run_icl.py data=citeseer data.n_shots=5 text_info=a0x_t.a3y_t rel_info=spd0.spd3.ppr.a3x_sim
python run_icl.py data=citeseer data.n_shots=10 text_info=a0x_t.a3y_t rel_info=spd0.a0x_sim.a1x_sim
python run_icl.py data=citeseer data.n_shots=15 text_info=a0x_t.a3y_t rel_info=spd0.a0x_sim.a1x_sim
python run_icl.py data=citeseer data.n_shots=20 text_info=a0x_t.a3y_t rel_info=spd0.spd3.a2x_sim.a3x_sim

python run_icl.py data=texas data.n_shots=1 text_info=a2y_t rel_info=spd0.spd2
python run_icl.py data=texas data.n_shots=3 text_info=choice rel_info=spd3
python run_icl.py data=texas data.n_shots=5 text_info=a2y_t rel_info=spd0.spd2
python run_icl.py data=texas data.n_shots=10 text_info=choice rel_info=spd2
python run_icl.py data=texas data.n_shots=15 text_info=choice rel_info=spd2
python run_icl.py data=texas data.n_shots=20 text_info=choice rel_info=spd2

Supervised Fine-tuning (SFT)

GraphText supports instruction fine-tuning a LLM on graph. An MLP is used to map the continuous feature to text space (as tokens). We recommend to use BF16 for stable training.

cd src/scripts
python run_sft.py exp=sft lora.r=-1 run_sft.py data=citeseer_tag nb_padding=false add_label_name_output=false max_bsz_per_gpu=4 eq_batch_size=16 rel_info=spd0.a0x_sim.ppr text_info=x llm.base_model=llama2-7b node_dropout=0 subgraph_size=3 total_steps=1000

python run_sft.py exp=sft lora.r=-1 run_sft.py data=cora_tag nb_padding=false add_label_name_output=false max_bsz_per_gpu=4 eq_batch_size=16 rel_info=spd0.a1x_sim text_info=x llm.base_model=llama2-7b node_dropout=0 subgraph_size=3 total_steps=1000

Misc

Analyze the Results

We highly recommend using Wandb to track the metrics. All the results are saved to an Excel file "${out_dir}{split}-${alias}.csv" with prompt and the generated text.

Other Useful Parameters

FAQ

GPT initialize failed

Error message: Error locating target 'llm.gpt.GPT', set env var HYDRA_FULL_ERROR=1 to see chained exception. Checklist:

Citation

If you find our work useful, please consider citing our work:

@misc{zhao2023graphtext,
      title={GraphText: Graph Reasoning in Text Space}, 
      author={Jianan Zhao and Le Zhuo and Yikang Shen and Meng Qu and Kai Liu and Michael Bronstein and Zhaocheng Zhu and Jian Tang},
      year={2023},
      eprint={2310.01089},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}