Home

Awesome

<h1 align="center">Exploring Chemical Space with<br>Score-based Out-of-distribution Generation</h1>

This is the official code repository for the paper Exploring Chemical Space with Score-based Out-of-distribution Generation (ICML 2023), in which we propose Molecular Out-Of-distribution Diffusion (MOOD).

<p align="center"> <img width="750" src="assets/concept.png"/> </p>

Contribution

Dependencies

Run the following commands to install the dependencies:

conda create -n mood python=3.8
conda activate mood
conda install -c pytorch pytorch==1.12.0 cudatoolkit=11.3
conda install -c conda-forge rdkit=2020.09 openbabel
pip install tqdm pyyaml pandas easydict networkx==2.6.3 numpy==1.20.3
chmod u+x scorer/qvina02

Running Experiments

1. Preparation

MOOD utilizes GDSS as its backbone diffusion model. In our paper, we utilized the pretrained gdss_zinc250k_v2.pth GDSS checkpoint, which is in the folder checkpoints/ZINC250k.

Run the following command to preprocess the ZINC250k dataset:

python data/preprocess.py

2. Training a Property Prediction Network $P_\phi$

We provide the pretrained property predictor networks ($P_\text{obj}=\hat{\text{DS}} \times \text{QED} \times \hat{\text{SA}}$) for target proteins parp1, fa7, 5ht1b, braf, and jak2, respectively (prop_parp1.pth, prop_fa7.pth, prop_5ht1b.pth, prop_braf.pth, and prop_jak2.pth, respectively), in the folder checkpoints/ZINC250k.

To train your own property predictor, run the following command:

CUDA_VISIBLE_DEVICES=${gpu_id} python main.py --type train --config prop_train

You can modify hyperparameters in config/prop_train.yaml.

3. Generation and Evaluation

To generate molecules, run the following command:

CUDA_VISIBLE_DEVICES=${gpu_id} python main.py --type sample --config sample

You can modify hyperparameters in config/sample.yaml.

Citation

If you find this repository and our paper useful, we kindly request to cite our work.

@article{lee2023MOOD,
  author    = {Seul Lee and Jaehyeong Jo and Sung Ju Hwang},
  title     = {Exploring Chemical Space with Score-based Out-of-distribution Generation},
  journal   = {Proceedings of the 40th International Conference on Machine Learning},
  year      = {2023}
}