Awesome

Novel drug-like molecules generation

This repository implements the proposed pipelines and models in the paper Towards Efficient Generation, Correction and Properties Control of Unique Drug-Like Structures. It also contains scripts to reproduce the results of the models reported in the paper.

About

Novel drug-like molecules generation project enables fast generation and correction of novel chemical structures based on small reference dataset as well as their properties prediction (predictor architectures were trained and tested for log solubility, bbbp (blood-brain barrier permeability)). Efficient design and screening of novel molecules is a major challenge in drug and materials design. The project focuses on a multi-stage pipeline in which several deep neural network (DNN) models are combined to map discrete molecular representations into continuous vector space to later generate from it new molecular structures with desired properties. Here the Attention-based Sequence-to-Sequence model is added to “spellcheck” and correct generated structures while the oversampling in the continuous space allows generating candidate structures with desired distribution for properties and molecular descriptors even for small reference datasets. With the focus on the drug design, such a pipeline allows generating novel structures with control of SAS (Synthetic Accessibility Score) and a series of ADME metrics that assess the drug-likeliness.

Installation instructions

Install all necessary packages by running pip3 install -r requirements.txt

Usage

To train models

specify dataset paths and hyperparameters in the corresponding config file (i.e ae/train_config.yaml)
run python3 train.py [path to config file]

To generate new structures

specify the path to the reference dataset of SMILES in pipeline/pipeline_config.yaml
run python3 generate_new_structures.py

To analyze the novel structures

use novel_smiles_analysis.ipynb

Data

All models were trained on publicly available data only: autoencoders on data from eMolecules database https://www.emolecules.com/info/plus/download-database, solubility dataset was compiled from sets provided by Huuskonen, Hou et al., Delaney, and Mitchell (see the paper for more details).

Contributing

Fork it
Create your feature branch (git checkout -b my-new-feature)
Commit your changes (git commit -am 'Added some feature')
Push to the branch (git push origin my-new-feature)
Create new Pull Request to us

Authors

Maksym Druchok mdruc@softserveinc.com
Dzvenymyra Yarish dyari@softserveinc.com
Oleksandr Gurbych ogurb@softserveinc.com
Mykola Maksymenko mmaks@softserveinc.com