Home

Awesome

<h1 align="center"><b>MoleRec</b></h1> <p align="center"> <a href="https://dl.acm.org/doi/10.1145/3543507.3583872"><img alt="Publication" src="https://img.shields.io/static/v1?label=Pub&message=TheWebConf%2723&color=purple"></a> <a href="https://github.com/yangnianzu0515/MoleRec/pulls"><img src="https://img.shields.io/badge/PRs-Welcome-yellow" alt="PRs"></a> <a href="https://github.com/yangnianzu0515/MoleRec/blob/master/LICENSE"><img alt="License" src="https://img.shields.io/github/license/yangnianzu0515/MoleRec?color=green"></a> <a href="https://github.com/yangnianzu0515/MoleRec/stargazers"><img src="https://img.shields.io/github/stars/yangnianzu0515/MoleRec?color=red&label=Star" alt="Stars"></a> <!-- <a href="https://yangnianzu0515.github.io/"><img src="https://img.shields.io/badge/Nianzu-Yang-blue" alt="MyWebsite"></a> --> </p>

Official implementation for our paper:

MoleRec: Combinatorial Drug Recommendation with Substructure-Aware Molecular Representation Learning

Nianzu Yang, Kaipeng Zeng, Qitian Wu, Junchi Yan* (* denotes correspondence)

Proceedings of the ACM Web Conference 2023 (TheWebConf (a.k.a. WWW) 2023)

News ๐ŸŽ‰

MoleRec has been incorporated into the PyHealth package as a benchmark method for the combinatorial drug recommendation task! ๐Ÿ‘ <a href="https://github.com/sunlabuiuc/pyhealth/stargazers"><img src="https://img.shields.io/github/stars/sunlabuiuc/pyhealth?color=blue&label=Star" alt="Stars"></a>

Folder Specification

Remark: data/ only contains part of the data. See the Data Generation section for more details.

Dependency

The MoleRec.yml lists all the dependencies of the MoleRec. To quickly set up a environment for our model, use the following command

conda env create -f MoleRec.yml

Data Generation

The usage of MIMIC-III datasets requires certification, so it's illegal for us to provide the raw data here. Therefore, if you want to have access to MIMIC-III datasets, you have to obtain the certification first and then download it from https://physionet.org/content/mimiciii/.

After downloading the MIMIC-III dataset, put the three csv file PRESCRIPTIONS.csv, DIAGNOSES_ICD.csv and PROCEDURES_ICD.csv from the raw data into the data/ folder and generate the necessary files for training and evaluating apart from the files that we already have provided in thte data/ folder, using the command as below:

cd data
python processing.py

For the explanation of each output file, please refer to the SafeDrug repository. Note that in our paper, we follow the same data processing procedure as the SafeDrug after the commit c7218d0.

If you want to re-generate ddi_matrix_H.pkl and substructure_smiles.pkl, use the following command:

cd data
python ddi_mask_H.py

Note that the BRICS decomposition method generates substructures in a random order. Since that ddi_matrix_H.pkl and substructure_smiles.pkl are effected by this order, if you re-generate these two files, please re-train the model. For convenience, we've already provided the generated result by us in data/ folder, which can be used for training and evaluating directly.

Run the Code

We provide two versions of our model. They learn the substructure representations using embedding table and GNNs, respectively. If you want to train or evaluate our model, please change your working directory first via๏ผš

cd src

Embedding Table Version

To train the model, use the following command:

python main.py --device ${device} --embedding --lr ${learning rate} --dp ${dropout rate} --dim ${dim} --target_ddi ${expected ddi} --coef ${coefficient of annealing weight} --epochs ${epochs}

To evaluate a well-trained model, use the following command:

python main.py --Test --embedding --resume_path ${model_path}

We've provide our well-trained model in folder best_models/, to evaluate it, use the command

python main.py --Test --embedding --resume_path ../best_models/embedding_table/MoleRec.model

GNNs Version

This version learns the substructure representation using GNNs, which is more powerful but has more parameters. You can use the following command to train the model:

python main.py --device ${device} --lr ${learning rate} --dp ${dropout rate} --dim ${dim} --target_ddi ${expected ddi} --coef ${coefficient of annealing weight} --epochs ${epochs}

To evaluate a well-trained model, use the following command:

python main.py --Test --resume_path ${model_path}

We also provide a well-trained model weight for this version, which can be evaluated by:

python main.py --Test --resume_path ../best_models/GNN/MoleRec.model

Citation

If you find our work useful in your research, please consider citing:

@inproceedings{yang2023molerec,
  title={MoleRec: Combinatorial Drug Recommendation with Substructure-Aware Molecular Representation Learning},
  author={Yang, Nianzu and Zeng, Kaipeng and Wu, Qitian and Yan, Junchi},
  booktitle={Proceedings of the ACM Web Conference 2023},
  pages={4075--4085},
  year={2023}
}

Welcome to contact us yangnianzu@sjtu.edu.cn or zengkaipeng@sjtu.edu.cn for any question.

Acknowledgement

We sincerely thank these repositories GAMENet and SafeDrug for their well-implemented pipeline upon which we build our codebase.