Home

Awesome

Mothra

Requirements

  1. Python==3.9, some errors show up in newer version of python. NOTICE:init.sh uses pyenv, if you have not installed it, you should install python3.9-venv
  2. Keras (version 2.0.5) If you installed the newest version of keras, some errors will show up. Please change it back to keras 2.0.5 by pip install keras==2.0.5.
  3. (*Optional but Highly recommended) CUDA (version 11.7) , cuDNN (version 8 for CUDA 11.x)
  4. tensoflow-gpu (version 1.15.2, ver>=2.0 occurred error.)
  5. rdkit
  6. rDock
  7. Autodock Vina Make sure to add Vina into system path.
  8. Open Babel Make sure to add OpenBabel into system path.
  9. eToxPred DL and untar https://github.com/pulimeng/eToxPred/raw/master/etoxpred_best_model.tar.gz into ligand_design/ for using toxcity prediction(Optional)

For installing Keras, rdkit, and other dependencies by pip on Virtual ENVironment, We provide requirements.txt and init.sh in init dir. After installing python, you may run bash inits/init.sh.

How to Use

Train the RNN model

  1. Run python train_RNN/train_RNN.py to train the RNN model. Pretrained model is provided in model/model.h5

Molecule generate

  1. Run python ligand_design/mcts_ligand.py data_dir

Although MOMCTS-MolGen has an extendable objective set, the default setting of objectives is docking score, QED score, logP, and a filter on SA score.

To modify your own objective set, change simulation functions in add_node_type.py, and change reward functions in mcts_ligand.py. (it may integrate into one function in future work)

If the size of the objective set is not 3, don't forget to change 'default_reward' in mcts_ligand.py.

Outputs of ligand_design process will store in data/present/, including:

output.txt             ## output of pareto front change
ligands.txt            ## ligands pass SA score filter.
scores.txt             ## raw scores of ligands
hverror_output.txt     ## output of hypervolume calculation errors
error_output.txt       ## output of vina and obabel errors

directory structure

.
├─data : for pretrain dataset
├─data_template : template directory for ligand generation
│  ├─input : set target protein(s) for docking on VINA and configure generation
│  ├─output : save generated ligands
│  ├─present : save valid generated ligands and their scores
│  └─workspace : a room for docking on each ligand
├─ligand_design : source code for ligand generation
├─model : save an RNN generative model.
└─train_RNN : train an RNN generative model.

License

This package is distributed under the GPL License.