Home

Awesome

Unified 2D and 3D Pre-Training of Molecular Representations

This repository contains the code for Unified 2D and 3D Pre-Training of Molecular Representations, which is introduced in KDD2022.

Dataset

We use the PCQM4Mv2 dataset for pre-training, which has 3.38M data.

Requirements and Installation

You can build a Docker image with the Dockerfile. To install code and develop it locally

pip install -e . 

Pre-training

bash run_training.sh --num-layers 12 --batch-size 128 --enable-tb \
    --node-attn --use-bn --pred-pos-residual --mask-prob 0.25 \
    -c 0,1,2,3 --dist

Finetuning

bash run_finetune.sh --num-layers 12 --batch-size 128 \
        --dropout 0.3 --dataset ogbg-molpcba \
        --pooler-dropout 0.1 --epochs 50 --seed 42 \
        -m /yourpretrainedmodel \
        --lr 0.0005 --weight-decay 0.01 --grad-norm 1 --prefix molpcba