Awesome
LigDream: Shape-Based Compound Generation
THIS PROJECT IS NOT LONGER ACTIVE. IT IS MADE AVAILABLE WITHOUT ANY SUPPORT.
Citing
If you are using content of the repository please consider citing the follow work:
@article{skalic2019shape,
title={Shape-Based Generative Modeling for de-novo Drug Design},
author={Skalic, Miha and Jim{\'e}nez Luna, Jos{\'e} and Sabbadin, Davide and De Fabritiis, Gianni},
journal={Journal of chemical information and modeling},
doi = {10.1021/acs.jcim.8b00706},
publisher={ACS Publications}
}
Requirements
Model training is written in pytorch==0.3.1
and uses keras==2.2.2
for data loaders. RDKit==2017.09.2.0
and HTMD==1.13.9
are needed for molecule manipulation.
Add the repo to your pythonpath
export PYTHONPATH=/path/to/ligdream/repo/:$PYTHONPATH
Before starting
For the training a smi file is needed. We used subset of the Zinc15 dataset, using only the drug-like. The same cleaned dataset can be retrieve by using the getDataset.sh
script. The latter will download the smi file required for the training (see next section).
bash getDataset.sh
In the traindataset
folder there will be the zinc15_druglike_clean_canonical_max60.smi
file that is required for the training step (see next section).
For the generation stage the model files are necessary. It is possible to use the ones that are generated during the training step or you can download the ones that we have already generated by using the following script:
bash getWeights.sh
In the modelweights
folder there will be the three models:
- decoder-210000.pkl
- encoder-210000.pkl
- vae-210000.pkl
Training
Note that training runs on a GPU and it will take several days to complete.
First construct a set of training molecules:
$ python prepare_data.py -i "./path/to/my/smiles.smi" -o "./path/to/my/smiles.npy"
Secondly, execute the training of a model:
$ python train.py -i "./path/to/my/smiles.npy" -o "./path/to/models"
Generation
Web based compund generation is available at https://playmolecule.org/LigDream/.
For an example of local novel compound generation please follow notebook generate.ipynb
.
License
Code is released under GNU AFFERO GENERAL PUBLIC LICENSE.