Awesome
Heterformer
This repository contains the source code and datasets for Heterformer: Transformer-based Deep Node Representation Learning on Heterogeneous Text-Rich Networks, published in KDD 2023.
Links
Requirements
The code is written in Python 3.6. Before running, you need to first install the required packages by typing following commands (Using a virtual environment is recommended):
pip3 install -r requirements.txt
Overview
Heterformer is a Transformer architecture (language model) for representation on heterogeneous text-rich (text-attributed) networks. It can take text data associated with nodes and heterogeneous network structure information into consideration.
<p align="center"> <img src="figure/Heterformer.png" width="600px"/> </p>Data
- Download raw data from DBLP, Twitter and Goodreads.
- Data processing: Run the cells in data/$dataset/data_processing.ipynb for first step data processing.
- Network Sampling: Run the cells in data/$dataset/sampling.ipynb for ego-network sampling and train/val/test data generation.
- Pretrain data: Run the cells in data/$dataset/generate_pretrain_data.ipynb for textless node pretraining data generation.
Train
- Pretrain textless node embeddings. Take Goodreads dataset as an example.
cd pretrain/
bash run.sh
- Prepare textless node embedding file for Heterformer training.
Run the cells in pretrain/transfer_embed.ipynb
- Heterformer training.
cd ..
python main.py --data_path data/$dataset --model_type Heterformer --pretrain_embed True --pretrain_dir data/$dataset/pretrain_embed
Test
python main.py --data_path data/$dataset --model_type Heterformer --mode test --load_ckpt_name $load_ckpt_dir
Inference
python main.py --data_path data/$dataset --model_type Heterformer --mode infer --load 1 --load_ckpt_name $load_ckpt_dir
Downstream
Transductive Text-rich node classification
cd downstream/
python classification.py --mode transductive --dataset $dataset --method Heterformer
Inductive Text-rich node classification
python classification.py --mode inductive --dataset $dataset --method Heterformer
Textless node classification
python author_classification.py --dataset $dataset --method Heterformer
Node Clustering
python clustering.py --mode transductive --dataset $dataset --method Heterformer
Retrieval
python retrieval.py --method Heterformer
Citations
Please cite the following paper if you find the code helpful for your research.
@inproceedings{jin2023heterformer,
title={Heterformer: Transformer-based deep node representation learning on heterogeneous text-rich networks},
author={Jin, Bowen and Zhang, Yu and Zhu, Qi and Han, Jiawei},
booktitle={Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining},
pages={1020--1031},
year={2023}
}