Awesome
ChemGE: Population-based De Novo Molecule Generation, Using Grammatical Evolution
Molecule design using grammatical evolution. The paper is available at https://arxiv.org/abs/1804.02134
Update: ChemGE paper is published in Chemistry Letters. https://doi.org/10.1246/cl.180665
The advantages of ChemGE are:
- Faster SMILES generation
- Inherent paralleism
- Novelty and diversity in designed molecules
In this repository, we provide the code used in our experiment.
- Benchmark on druglikeness score (
optimizeJ.py
) - Design of high-scoring molecules for thymidine kinase (
oprimize-rdock.py
) - Scalability of ChemGE on parallel environment (
optimize-rdock-qsub.py
)
Requirements
- Python
- RDKit
- rDock
How to set up on Ubuntu 16.04
Compile of rDock may fail in new compilers. I recommend to use Ubuntu 16.04.
Install Python libraries
git clone https://github.com/pyenv/pyenv.git $HOME/.pyenv
echo 'export PYENV_ROOT="$HOME/.pyenv"' >> $HOME/.bashrc
echo 'export PATH="$PYENV_ROOT/bin:$PATH"' >> $HOME/.bashrc
echo -e 'if command -v pyenv 1>/dev/null 2>&1; then\n eval "$(pyenv init -)"\nfi' >> $HOME/.bashrc
source $HOME/.bashrc
pyenv install anaconda3-5.0.1
pyenv global anaconda3-5.0.1
git clone https://github.com/pyenv/pyenv-virtualenv.git $(pyenv root)/plugins/pyenv-virtualenv
exec "$SHELL"
conda create -c rdkit -n my-rdkit-env rdkit
source activate my-rdkit-env
pip install nltk networkx
Install rDock
sudo apt-get update
sudo apt-get install build-essential libcppunit-dev libpopt-dev
cd ~
wget https://sourceforge.net/projects/rdock/files/rDock_2013.1_src.tar.gz
tar xf rDock_2013.1_src.tar.gz
cd rDock_2013.1_src/build/
make linux-g++-64
echo 'export RBT_ROOT="$HOME/rDock_2013.1_src"' >> $HOME/.bashrc
echo 'export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:$RBT_ROOT/lib"' >> $HOME/.bashrc
echo 'export PATH="$PATH:$RBT_ROOT/bin"' >> $HOME/.bashrc
source $HOME/.bashrc
Download ChemGE
git clone https://github.com/tsudalab/ChemGE.git
cd ChemGE
Usage
Please execute in my-rdkit-env
environment (please execute source activate my-rdkit-env
)
Optimize J
python -u optimize-J.py > log-file &
Optimize docking score (multi-thread)
python -u optimize-rdock.py > log-file &
If all score is 10000000000.0
, installation of rDock may have failed.
Please check installation directory.
Optimize docking score (qsub)
You need to set up parallel environment with qsub to execute this program. Using CfnCluster is recommended.
python -u optimize-rdock-qsub.py > log-file &
Explanation of files
results/
: log files of our experimentcavity.as
,cavity.prm
,receptor.mol2
: Information of thymidine kinase, which is required to run rDock. Detailed explanation is below.250k_rndm_zinc_drugs_clean.smi
: ZINC dataset, which is from mkusner/grammarVAEfpscores.pkl.gz
: Used to calculate J score, which is from mkusner/grammarVAE- Python files: Used in experiment. A part of
zinc_grammar.py
andcfg_util.py
is from mkusner/grammarVAE.
How to generate files required to execute rDock
cavity.prm
in this repository is for docking simulation with KITH. If you want to run ChemGE for different protein, you need to generate files on your own.
Assume that rDock is installed following above step.
-
Download
receptor.pdb
andcrystal_ligand.mol2
from DUD-E (e.g. KITH (DUD-E)) -
Execute following commands (Open Babel is required to run these scripts)
$ ChemGE/util/pdb2mol ./receptor.pdb # pdb -> mol2
$ ChemGE/util/mol2sd crystal_ligand.mol2 # mol2 -> sd(f)
$ ChemGE/util/gen_prm crystal_ligand.sd receptor.mol2 > cavity.prm
$ $RBT_ROOT/build/exe/rbcavity -r cavity.prm -W
License
This project is licensed under the terms of the MIT license.