Home

Awesome

DrugGEN: Target Specific De Novo Design of Drug Candidate Molecules with Graph Transformer-based Generative Adversarial Networks

<p align="center"> <a href="https://arxiv.org/abs/2302.07868"><img src="https://img.shields.io/badge/arXiv-preprint-B31B1B?style-for-the-badge&logo=arXiv"/></a> <a href="https://pytorch.org/"><img src="https://img.shields.io/badge/PyTorch-Implementation-EE4C2C?style-for-the-badge&logo=PyTorch"/></a> <a href="https://pytorch-geometric.readthedocs.io/en/latest/"><img src="https://img.shields.io/badge/PyG-Implementation-3C2179?style-for-the-badge&logo=PyG"/></a> <a href=https://github.com/asarigun/DrugGEN/blob/main/LICENSE><img src="https://img.shields.io/badge/See%20-License%20-blue"/></a> <!--<a href="Give a link here" alt="license"><img src="https://colab.research.google.com/assets/colab-badge.svg"/></a>--> </p> <!--PUT HERE SOME QUALITATIVE RESULTS IN THE ASSETS FOLDER--> <!--YOU CAN PUT ALSO IN THE GIF OR PNG FORMAT --> <!--<p float="center"> <img src="assets/sample1.png" width="49%" /> <img src="assets/sample2.png" width="49%" /> </p>--> <!--PUT THE ANIMATED GIF VERSION OF THE DRUGGEN MODEL (Figure 1)--> <p float="center"> <img src="assets/druggen_figure1_mod.gif" width="98%" /> </p> <!-- ## Abstract > Discovering novel drug candidate molecules is one of the most fundamental and critical steps in drug development. Generative deep learning models, which create synthetic data given a probability distribution, have been developed with the purpose of picking completely new samples from a partially known space. Generative models offer high potential for designing de novo molecules; however, in order for them to be useful in real-life drug development pipelines, these models should be able to design target-specific molecules, which is the next step in this field. In this study, we propose a novel generative system, DrugGEN, for de novo design of drug candidate molecules that interact with selected target proteins. The proposed system represents compounds and protein structures as graphs and processes them via serially connected two generative adversarial networks comprising graph transformers. DrugGEN is implemented with five independent models, each with a unique sample generation routine. The system is trained using a large dataset of compounds from ChEMBL and target-specific bioactive molecules, to design effective and specific inhibitory molecules against the AKT1 protein, effective targeting of which has critical importance for developing treatments against various types of cancer. DrugGEN has a competitive or better performance against other methods on fundamental benchmarks. To assess the target-specific generation performance, we conducted further in silico analysis with molecular docking and deep learning-based bioactivity prediction. Their results indicate that de novo molecules have high potential for interacting with the AKT1 protein structure in the level of its native ligand. DrugGEN can be used to design completely novel and effective target-specific drug candidate molecules for any druggable protein, given the target features and a dataset of experimental bioactivities. -->

Check out our paper below for more details

DrugGEN: Target Centric De Novo Design of Drug Candidate Molecules with Graph Generative Deep Adversarial Networks,
Atabey Ünlü, Elif Çevrim, Ahmet Sarıgün, Heval Ataş, Altay Koyaş, Hayriye Çelikbilek, Deniz Cansen Kahraman, Abdurrahman Olğaç, Ahmet S. Rifaioğlu, Tunca Doğan
Arxiv, 2023

Features

<!--PUT HERE 1-2 SENTECE FOR METHOD WHICH SHOULD BE SHORT Pleaser refer to our [arXiv report](link here) for further details. -->

This implementation:

<!-- - supports both CPU and GPU inference (though GPU is way faster), --> <!-- ADD HERE SOME FEATURES FOR DRUGGEN & SUMMARIES & BULLET POINTS --> <!-- ADD THE ANIMATED GIF VERSION OF THE GAN1 AND GAN2 -->
First GeneratorSecond Generator
FirstGANSecondGAN

Preliminary results (generated molecules)

ChEMBL-25ChEMBL-45
ChEMBL_25ChEMBL_45

Overview

We provide the implementation of the DrugGEN, along with scripts from PyTorch Geometric framework to generate and run. The repository is organised as follows:

data contains:

experiments contains:

Python scripts:

Datasets

Three different data types (i.e., compound, protein, and bioactivity) were retrieved from various data sources to train our deep generative models. GAN1 module requires only compound data while GAN2 requires all of three data types including compound, protein, and bioactivity.

<!-- To enhance the size of the bioactivity dataset, we also obtained two alternative versions by incorporating ligand interactions of protein members in non-specific serine/threonine kinase (STK) and kinase families. -->

More details on the construction of datasets can be found in our paper referenced above.

Updates

Getting Started

DrugGEN has been implemented and tested on Ubuntu 18.04 with python >= 3.9. It supports both GPU and CPU inference.

<!--If you don't have a suitable device, try running our Colab demo. -->

Clone the repo:

git clone https://github.com/asarigun/DrugGEN.git
<!-- Install the requirements using `virtualenv` or `conda`: ```bash # pip source install/install_pip.sh # conda source install/install_conda.sh ``` ## Running the Demo You could try Google Colab if you don't already have a suitable environment for running this project. It enables cost-free project execution in the cloud. You can use the provided notebook to try out our Colab demo: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](Give a link here)-->

Training

Setting up environment

You can set up the environment using either conda or pip.

Here is with conda:

# set up the environment (installs the requirements):

conda env create -f DrugGEN/dependencies.yml

# activate the environment:

conda activate druggen

Here is with pip using virtual environment:

python -m venv DrugGEN/.venv
./Druggen/.venv/bin/activate
pip install -r DrugGEN/requirements.txt

Starting the training


# Download the raw files

cd DrugGEN/data

bash dataset_download.sh

# DrugGEN can be trained with a one-liner

python DrugGEN/main.py --submodel="CrossLoss" --mode="train" --raw_file="DrugGEN/data/chembl_train.smi" --dataset_file="chembl45_train.pt" --drug_raw_file="DrugGEN/data/akt_train.smi" --drug_dataset_file="drugs_train.pt" --max_atom=45

** Please find the arguments in the main.py file. Explanation of the commands can be found below.

Model arguments:
  --submodel SUBMODEL       Choose the submodel for training
  --act ACT                 Activation function for the model
  --z_dim Z_DIM             Prior noise for the first GAN
  --max_atom MAX ATOM       Maximum atom number for molecules must be specified
  --lambda_gp LAMBDA_GP     Gradient penalty lambda multiplier for the first GAN
  --dim DIM                 Dimension of the Transformer models for both GANs
  --depth DEPTH             Depth of the Transformer model from the first GAN
  --heads HEADS             Number of heads for the MultiHeadAttention module from the first GAN
  --dec_depth DEC_DEPTH     Depth of the Transformer model from the second GAN
  --dec_heads DEC_HEADS     Number of heads for the MultiHeadAttention module from the second GAN
  --mlp_ratio MLP_RATIO     MLP ratio for the Transformers
  --dis_select DIS_SELECT   Select the discriminator for the first and second GAN
  --init_type INIT_TYPE     Initialization type for the model
  --dropout DROPOUT         Dropout rate for the encoder
  --dec_dropout DEC_DROPOUT Dropout rate for the decoder
Training arguments:
  --batch_size BATCH_SIZE   Batch size for the training
  --epoch EPOCH             Epoch number for Training
  --warm_up_steps           Warm up steps for the first GAN
  --g_lr G_LR               Learning rate for G
  --g2_lr G2_LR             Learning rate for G2
  --d_lr D_LR               Learning rate for D
  --d2_lr D2_LR             Learning rate for D2      
  --n_critic N_CRITIC       Number of D updates per each G update
  --beta1 BETA1             Beta1 for Adam optimizer
  --beta2 BETA2             Beta2 for Adam optimizer 
  --clipping_value          Clipping value for the gradient clipping process
  --resume_iters            Resume training from this step for fine tuning if desired
Dataset arguments:      
  --features FEATURES       Additional node features (Boolean) (Please check new_dataloader.py Line 102)

Molecule Generation Using Trained DrugGEN Models in the Inference Mode


python DrugGEN/main.py --submodel="{Chosen model name}" --mode="inference" --inference_model="DrugGEN/experiments/models/{Chosen model name}"

De Novo Generated Molecules and its AKT1 inhibitor subset

structures

Citation

@article{unlu2023target,
  title={Target Specific De Novo Design of Drug Candidate Molecules with Graph Transformer-based Generative Adversarial Networks},
  author={{\"U}nl{\"u}, Atabey and {\c{C}}evrim, Elif and Sar{\i}g{\"u}n, Ahmet and {\c{C}}elikbilek, Hayriye and G{\"u}venilir, Heval Ata{\c{s}} and Koya{\c{s}}, Altay and Kahraman, Deniz Cansen, Ol{\u{g}}a{\c{c}}, Abdurrahman, and Rifaio{\u{g}}lu, Ahmet, and Dogan, Tunca},
  journal={arXiv preprint arXiv:2302.07868},
  year={2023}
}
<!--ADD BIBTEX AFTER THE PUBLISHING-->

License

This code is available for non-commercial scientific research purposes as will be defined in the LICENSE file. By downloading and using this code you agree to the terms in the LICENSE. Third-party datasets and software are subject to their respective licenses.

<!--ADD LICENSE TERMS AND LICENSE FILE AND GIVE A LINK HERE-->

References

In each file, we indicate whether a function or script is imported from another source. Here are some excellent sources from which we benefit:

<!--ADD THE REFERENCES THAT WE USED DURING THE IMPLEMENTATION-->

Check out the latest project repository on <a href="https://github.com/HUBioDataLab/DrugGEN"><img src="https://img.shields.io/badge/Github-UpdatedRepo-181717?style-for-the-badge&logo=GitHub"/></a>!