Awesome

Learning to Pre-train Graph Neural Networks

This repository is the official implementation of AAAI-2021 paper Learning to Pre-train Graph Neural Networks

Requirements

To install requirements:

pip install -r requirements.txt

Dataset

All the necessary data files can be downloaded from the following links.

For Biology dataset, download from Google Drive and BaiduYun (Extraction code: j97n), unzip it, and put the under data/bio/.

The new compilation of bibliographic graphs, i.e., PreDBLP, download from Google Drive and BaiduYun (Extraction code: j97n), unzip it, and move the dblp.graph file to data/dblp/unsupervised/processed/ and the dblpfinetune.graph file to data/dblp/supervised/processed/, respectively.

Also, to avoid the "file incomplete" errors caused by compressed files, we also upload the uncompressed dblp dataset at BaiduYun (Extraction code: j97n).

Training

To pre-train L2P-GNN on Biology dataset w.r.t. GIN model, run this command:

python main.py --dataset DATASET  --gnn_type GNN_MODEL --model_file PRE_TRAINED_MODEL_NAME --device 1

The pre-trained models are saved into res/DATASET/ .

Evaluation

To fine-tune L2P-GNN on Biology dataset, run:

python eval_bio.py --dataset DATASET  --gnn_type GNN_MODEL --emb_trained_model_file EMB_TRAINED_FILE --pre_trained_model_file GNN_TRAINED_FILE --pool_trained_model_file POOL_TRAINED_FILE --result_file RESULT_FILE --device 1

The results w.r.t 10 random running seeds are saved into res/DATASET/finetune_seed(0-9)/

Results

To analysis results of downstream tasks, run:

python result_analysis.py  --dataset DATASET --times SEED_NUM

where SEED_NUM is the number of random seed ranging from 0 to 9, thus it is usually set to 10.

Reproducing results in the paper

Our results in the paper can be reproduced by directly running:

python eval_bio.py --dataset bio --gnn_type gin --emb_trained_model_file co_adaptation_5_300_gin_50_emb.pth --pre_trained_model_file co_adaptation_5_300_gin_50_gnn.pth --pool_trained_model_file co_adaptation_5_300_gin_50_pool.pth --result_file co_adaptation_5_300_gin_50 --device 0

and

python eval_dblp.py --dataset dblp --gnn_type gin --split random --emb_trained_model_file co_adaptation_5_300_s50q30_gin_20_emb.pth --pre_trained_model_file co_adaptation_5_300_s50q30_gin_20_gnn.pth --pool_trained_model_file co_adaptation_5_300_s50q30_gin_20_pool.pth --result_file co_adaptation_5_300_s50q30_gin_20  --device 0 --dropout_ratio 0.1