Awesome

MOFormer

MOFormer: Self-Supervised Transformer model for Metal-Organic Framework Property Prediction Journal of the American Chemical Society (JACS) [Paper] [arXiv] [PDF] Zhonglin Cao*, Rishikesh Magar*, Yuyang Wang, Amir Barati Farimani (*equal contribution) Carnegie Mellon University

This is the official implementation of "MOFormer: Self-Supervised Transformer model for Metal-Organic Framework Property Prediction". In this work, we propose a structure-agnostic deep learning method based on the Transformer model, named as MOFormer, for property predictions of MOFs. MOFormer takes a text string representation of MOF (MOFid) as input, thus circumventing the need of obtaining the 3D structure of a hypothetical MOF and accelerating the screening process. Furthermore, we introduce a self-supervised learning framework that pretrains the MOFormer via maximizing the cross-correlation between its structure-agnostic representations and structure-based representations of the crystal graph convolutional neural network (CGCNN) on >400k publicly available MOF data. Benchmarks show that pretraining improves the prediction accuracy of both models on various downstream prediction tasks. If you find our work useful in your research, please cite:

@article{doi:10.1021/jacs.2c11420,
    author = {Cao, Zhonglin and Magar, Rishikesh and Wang, Yuyang and Barati Farimani, Amir},
    title = {MOFormer: Self-Supervised Transformer Model for Metal–Organic Framework Property Prediction},
    journal = {Journal of the American Chemical Society},
    volume = {145},
    number = {5},
    pages = {2958-2967},
    year = {2023},
    doi = {10.1021/jacs.2c11420},
    URL = {https://doi.org/10.1021/jacs.2c11420}
}

Getting Started

Installation

Set up conda environment and clone the github repo

# create a new environment
$ conda create -n myenv python=3.9
$ conda activate moformer
$ conda install pytorch==1.10.1 cudatoolkit=11.3 -c pytorch -c conda-forge
$ conda install --channel conda-forge pymatgen
$ pip install transformers
$ conda install -c conda-forge tensorboard

# clone the source code of MOFormer
$ git clone https://github.com/zcao0420/MOFormer
$ cd MOFormer

Dataset

All the data used in this work can be found in benchmark_datasets folder. If you use any data in this work, please cite the corresponding reference included in the Acknowledgement.

Checkpoints

Pre-trained model can be found in ckpt folder.

Run the Model

Pre-training

To pre-train the model using SSL from scratch, one can run python pretrain_SSL.py. The config file for the pretraining takes as input the directory of the cif file along with a file called id_prop.npy. The id_prop.npy has cif id and its corresponding mof id string representation. We have added folder named cif_toy with 100 cif files for MOFs and the the corresponding id_prop.npy for the data in the cif_toy folder. If you intend to run the pretraining for the cif_toy folder please make sure you update config_multiview.yaml indicating a proper location for the root directory. The pretraining dataset is available on figshare

python pretrain_SSL.py

Fine-tuning

To fine-tune the pre-trained Transformer, one can run finetune_transformer.py where the configurations are defined in config_ft_transformer.yaml.

python finetune_transformer.py

Similarly, to fine-tune the pre-trained CGCNN, one can run finetune_cgcnn.py where the configurations are defined in config_ft_cgcnn.yaml.

python finetune_cgcnn.py

We also provide a jupyter notebook demo.ipynb for finetuning/supervised training.

Acknowledgement

CGCNN: Paper and Code
Barlow Twins: Paper and Code
Crystal Twins: Paper and Code
MOFid: Paper and Code
Boyd&Woo Dataset Paper
QMOF Paper1 and Paper2
hMOF Paper

Questions about the code

The graduate students who worked on the paper have graduated from CMU. We monitor the github repository regularly, please feel free raise github issues for questions or concerns about the code. It makes it easier for us to handle the code requests.