Home

Awesome

Deep AUC Maximization on Graph Property Prediction

This repo contains code submission for OGB challenge. Here, we focus on ogbg-molhiv, which is a binary classification task to predict target molecular property, e.g, whether a molecule inhibits HIV virus replication or not. The evaluation metric is AUROC. To our best knowledge, this is the first solution to directly optimize AUC score in this task. Our AUC-Margin loss improves baseline (DeepGCN) to 0.8159 and achieves SOTA performance 0.8352 when jointly training with Neural FingerPrints. Our approaches are implemented in LibAUC, which is a ML library for AUC optimization.

Results on ogbg-molhiv

Our method ranks 1st place as of 10/11/2021 on the leaderboard! We present our results on the ogbg-molhiv dataset with some strong baselines as below:

MethodTest AUROCValidation AUROCParametersHardware
DeepGCN0.7858±0.01170.8427±0.0063531,976Tesla V100 (32GB)
DeeperGCN+FLAG0.7942±0.01200.8425±0.0061531,976Tesla V100 (32GB)
Neural FingerPrints0.8232±0.00470.8331±0.00542,425,102Tesla V100 (32GB)
Graphormer0.8051±0.00530.8310±0.008947,183,040Tesla V100 (16GB)
DeepAUC (Ours)0.8159±0.00590.8054±0.00801,019,407Tesla V100 (32GB)
DeepAUC+FPs (Ours)0.8352±0.00540.8238±0.00611,019,407**Tesla V100 (32GB)

Requirements

  1. Install base packages:
    Python>=3.7
    Pytorch>=1.9.0
    tensorflow>=2.0.0
    pytorch_geometric>=1.6.0
    ogb>=1.3.2 
    dgl>=0.5.3 
    numpy==1.20.3
    pandas==1.2.5
    scikit-learn==0.24.2
    deep_gcns_torch
    
  2. Install LibAUC (using AUC-Margin loss and PESG optimizer):
    pip install LibAUC
    

Training

The training process has two steps: 1) we train a DeepGCN model using our AUC-margin loss from scratch. 2) we jointly finetuning the pretrained model from (1) with FingerPrints models.

Training from scratch using AUC-margin loss:

python main.py --use_gpu --conv_encode_edge --num_layers 14 --block res+ --gcn_aggr softmax --t 1.0 --learn_t --dropout 0.2 \
            --dataset ogbg-molhiv \
	    --loss auroc \
            --optimizer pesg \
            --batch_size 512 \
	    --lr 0.1 \
            --gamma 500 \
            --margin 1.0 \
            --weight_decay 1e-5 \
            --random_seed 0 \
            --epochs 300

Jointly traininig with FingerPrints Model

python extract_fingerprint.py
python random_forest.py
python finetune.py --use_gpu --conv_encode_edge --num_layers 14 --block res+ --gcn_aggr softmax --t 1.0 --learn_t --dropout 0.2 \
            --dataset ogbg-molhiv \
	    --loss auroc \
	    --optimizer pesg \
            --batch_size 512 \
	    --lr 0.01 \
            --gamma 300 \
            --margin 1.0 \
            --weight_decay 1e-5 \
            --random_seed 0 \
            --epochs 100

Results

The results (1) improves the original baseline (DeepGCN) to 0.8159, which is ~3% improvement. The result (2) achieves a higher SOTA performance 0.8352, which is ~1% improvement over previous baselines. For each stage, we train model by 10 times using different random seeds, e.g., 0 to 9.

Citation

If you have any questions, please open an new issue in this repo or contact us @ Zhuoning Yuan [yzhuoning@gmail.com]. If you find this work useful, please cite the following paper for our method and library:

@inproceedings{yuan2021robust,
	title={Large-scale Robust Deep AUC Maximization: A New Surrogate Loss and Empirical Studies on Medical Image Classification},
	author={Yuan, Zhuoning and Yan, Yan and Sonka, Milan and Yang, Tianbao},
	booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
	year={2021}
	}

Reference