Home

Awesome

MolR

This repository is the PyTorch implementation of MolR (paper):

Chemical-Reaction-Aware Molecule Representation Learning
Hongwei Wang, Weijiang Li, Xiaomeng Jin, Kyunghyun Cho, Heng Ji, Jiawei Han, Martin Burke
The 10th International Conference on Learning Representations (ICLR 2022)

MolR uses graph neural networks (GNNs) as the molecule encoder, and preserves the equivalence of molecules w.r.t. chemical reactions in the embedding space. Specifically, MolR forces the sum of the reactant embeddings and the sum of the product embeddings to be equal for each chemical reaction, which is shown to keep the embedding space well-organized and improve the generalization ability of the model.
MolR achieves substantial gains over state-of-the-art baselines. Below is the result of Hit@1 on USPTO-479k and real reaction dataset for the task of chemical reaction prediction:

DatasetUSPTO-479kreal reaction
Mol2vec0.6140.313
MolBERT0.6230.313
MolR-TAG0.8820.625

Below is the result of AUC on BBBP, HIV, and BACE datasets for the task of molecule property prediction:

DatasetBBBPHIVBACE
Mol2vec0.8720.7690.862
MolBERT0.7620.7830.866
MolR-GCN0.8900.8020.882

Below is the result of RMSE on QM9 dataset for the task of graph-edit-distance prediction:

DatasetQM9
Mol2vec0.995
MolBERT0.937
MolR-SAGE0.817

Below are the visualized reactions of alcohol oxidation and aldehyde oxidation using PCA: <img src="https://github.com/hwwang55/MolR/blob/master/reaction.png" alt="drawing" width="400"/>

Below is the visualized molecule embedding space on BBBP dataset using t-SNE: <img src="https://github.com/hwwang55/MolR/blob/master/space.png" alt="drawing" width="700"/>

For more results, please refer to our paper.

Files in the folder

Running the code

Required packages

The code has been tested running under Python 3.7 and CUDA 11.0, with the following packages installed (along with their dependencies):