Home

Awesome

Pretraining Language Models with Text-Attributed Heterogeneous Graphs

Data Preparation

Please download the datasets from DatasetsForTHLM , and put it into ./Data

Model Pretraining

Example of training THLM on Patents dataset:

python main.py --dataset_name Patents

Get node embeddings

Obtain node embeddings for Patents, GoodReads and OAG_Venue in ./Downstream/preprocess_data

Example of obtaining node embeddings for Patents:

python Patent_features.py

Model Evaluation

Pre-trained Language Models

We also provide the pre-trained language models on these three datasets at HuggingFace.

Environments: