Home

Awesome

<p align="center"> <img src="logo.png" width="75%" class="center" alt="logo"/> </p>

PyGCL is a PyTorch-based open-source Graph Contrastive Learning (GCL) library, which features modularized GCL components from published papers, standardized evaluation, and experiment management.

Made with Python PyPI version Documentation Status GitHub stars GitHub forks Total lines visitors


What is Graph Contrastive Learning?

Graph Contrastive Learning (GCL) establishes a new paradigm for learning graph representations without human annotations. A typical GCL algorithm firstly constructs multiple graph views via stochastic augmentation of the input and then learns representations by contrasting positive samples against negative ones.

👉 For a general introduction of GCL, please refer to our paper and blog. Also, this repo tracks newly published GCL papers.

Install

Prerequisites

PyGCL needs the following packages to be installed beforehand:

Installation via PyPI

To install PyGCL with pip, simply run:

pip install PyGCL

Then, you can import GCL from your current environment.

A note regarding DGL

Currently the DGL team maintains two versions, dgl for CPU support and dgl-cu*** for CUDA support. Since pip treats them as different packages, it is hard for PyGCL to check for the version requirement of dgl. We have removed such dependency checks for dgl in our setup configuration and require the users to install a proper version by themselves.

Package Overview

Our PyGCL implements four main components of graph contrastive learning algorithms:

We also implement utilities for training models, evaluating model performance, and managing experiments.

Implementations and Examples

For a quick start, please check out the examples folder. We currently implemented the following methods:

Building Your Own GCL Algorithms

Besides try the above examples for node and graph classification tasks, you can also build your own graph contrastive learning algorithms straightforwardly.

Graph Augmentation

In GCL.augmentors, PyGCL provides the Augmentor base class, which offers a universal interface for graph augmentation functions. Specifically, PyGCL implements the following augmentation functions:

AugmentationClass name
Edge Adding (EA)EdgeAdding
Edge Removing (ER)EdgeRemoving
Feature Masking (FM)FeatureMasking
Feature Dropout (FD)FeatureDropout
Edge Attribute Masking (EAR)EdgeAttrMasking
Personalized PageRank (PPR)PPRDiffusion
Markov Diffusion Kernel (MDK)MarkovDiffusion
Node Dropping (ND)NodeDropping
Node Shuffling (NS)NodeShuffling
Subgraphs induced by Random Walks (RWS)RWSampling
Ego-net Sampling (ES)Identity

Call these augmentation functions by feeding with a Graph in a tuple form of node features, edge index, and edge features (x, edge_index, edge_attrs) will produce corresponding augmented graphs.

Composite Augmentations

PyGCL supports composing arbitrary numbers of augmentations together. To compose a list of augmentation instances augmentors, you need to use the Compose class:

import GCL.augmentors as A

aug = A.Compose([A.EdgeRemoving(pe=0.3), A.FeatureMasking(pf=0.3)])

You can also use the RandomChoice class to randomly draw a few augmentations each time:

import GCL.augmentors as A

aug = A.RandomChoice([A.RWSampling(num_seeds=1000, walk_length=10),
                      A.NodeDropping(pn=0.1),
                      A.FeatureMasking(pf=0.1),
                      A.EdgeRemoving(pe=0.1)],
                     num_choices=1)

Customizing Your Own Augmentation

You can write your own augmentation functions by inheriting the base Augmentor class and defining the augment function.

Contrasting Architectures and Modes

Existing GCL architectures could be grouped into two lines: negative-sample-based methods and negative-sample-free ones.

Contrastive architecturesSupported contrastive modesNeed negative samplesClass nameExamples
Single-branch contrastingG2L onlySingleBranchContrastDGI, InfoGraph
Dual-branch contrastingL2L, G2G, and G2LDualBranchContrastGRACE
Bootstrapped contrastingL2L, G2G, and G2LBootstrapContrastBGRL
Within-embedding contrastingL2L and G2GWithinEmbedContrastGBT

Moreover, you can use add_extra_mask if you want to add positives or remove negatives. This function performs bitwise ADD to extra positive masks specified by extra_pos_mask and bitwise OR to extra negative masks specified by extra_neg_mask. It is helpful, for example, when you have supervision signals from labels and want to train the model in a semi-supervised manner.

Internally, PyGCL calls Sampler classes in GCL.models that receive embeddings and produce positive/negative masks. PyGCL implements three contrasting modes: (a) Local-Local (L2L), (b) Global-Global (G2G), and (c) Global-Local (G2L) modes. L2L and G2G modes contrast embeddings at the same scale and the latter G2L one performs cross-scale contrasting. To implement your own GCL model, you may also use these provided sampler models:

Contrastive modesClass name
Same-scale contrasting (L2L and G2G)SameScaleSampler
Cross-scale contrasting (G2L)CrossScaleSampler

Contrastive Objectives

In GCL.losses, PyGCL implements the following contrastive objectives:

Contrastive objectivesClass name
InfoNCE lossInfoNCE
Jensen-Shannon Divergence (JSD) lossJSD
Triplet Margin (TM) lossTriplet
Bootstrapping Latent (BL) lossBootstrapLatent
Barlow Twins (BT) lossBarlowTwins
VICReg lossVICReg

All these objectives are able to contrast any arbitrary positive and negative pairs, except for Barlow Twins and VICReg losses that perform contrastive learning within embeddings. Moreover, for InfoNCE and Triplet losses, we further provide SP variants that computes contrastive objectives given only one positive pair per sample to speed up computation and avoid excessive memory consumption.

Negative Sampling Strategies

PyGCL further implements several negative sampling strategies:

Negative sampling strategiesClass name
SubsamplingGCL.models.SubSampler
Hard negative mixingGCL.models.HardMixing
Conditional negative samplingGCL.models.Ring
Debiased contrastive objectiveGCL.losses.DebiasedInfoNCE , GCL.losses.DebiasedJSD
Hardness-biased negative samplingGCL.losses.HardnessInfoNCE, GCL.losses.HardnessJSD

The former three models serve as an additional sampling step similar to existing Sampler ones and can be used in conjunction with any objectives. The last two objectives are only for InfoNCE and JSD losses.

Utilities

PyGCL provides a variety of evaluator functions to evaluate the embedding quality:

EvaluatorClass name
Logistic regressionLREvaluator
Support vector machineSVMEvaluator
Random forestRFEvaluator

To use these evaluators, you first need to generate dataset splits by get_split (random split) or by from_predefined_split (according to preset splits).

Contribution

Feel free to open an issue should you find anything unexpected or create pull requests to add your own work! We are motivated to continuously make PyGCL even better.

Citation

Please cite our paper if you use this code in your own work:

@article{Zhu:2021tu,
author = {Zhu, Yanqiao and Xu, Yichen and Liu, Qiang and Wu, Shu},
title = {{An Empirical Study of Graph Contrastive Learning}},
journal = {arXiv.org},
year = {2021},
eprint = {2109.01116v1},
eprinttype = {arxiv},
eprintclass = {cs.LG},
month = sep,
}