Home

Awesome

Introduction

This respository contains code needed to reporoduce experiments reported in https://www.biorxiv.org/content/10.1101/2022.07.15.500218v1.

The work is built in previously published work from https://jcheminf.biomedcentral.com/articles/10.1186/s13321-017-0235-x.

We have provided examples notebooks for creating the input files neccessary to reproduce our results from Double-Model RIOP (DrIOP).

./notebooks

For all other RIOP experiments please refer to https://github.com/m-mokaya/RIOP.

Usage

  1. Templates for inputs are provided in reinvent/data/examples/templates folder. More examples will follow.
  2. There are templates for 6 running modes. Each running mode can be executed by "python input.py some_running_mode.json" after activating the environment. Templates have to be edited before using. The only thing that needs modification for a standard run are the file and folder paths. Most running modes produce logs that can be monitored by tensorboard, see below.
    • Logging folder is defined by setting a valid path to the "logging_path" field in json. This is required for all running modes.
  3. Running modes:
    • Sampling: sampling.json can be used to start sampling. Requires a generative model as an input and produces a file that contains SMILES. We provide a generative model "reinvent/data/augmented.prior". Alternatively focused Agents generated by transfer learning or reinforcement learning can be sampled as well.
    • Transfer Learning (TL): transfer_learning.json is the relevant template and it can be used to focus the general prior towards a narrow chemical space by training on a representative sample of SMILES provided by user. Requires as an input a list of SMILES (example format in "reinvent/data/smiles.smi") and the generative model "reinvent/data/augmented.prior". The result will be a set of generative Agent checkpoints produced after each epoch of training and a final focused Agent. Inspect the tensorboard logs to estimate which Agent has the level of focusing that you prefer.
    • Reinforcement Learning (RL): Use reinforcement_learning.json as a template. The general input requires paths for both Agent and Prior generative models (in "reinforcement_learning" section of the JSON files). Both can be the same model provided by us "reinvent/data/augmented.prior" or alternatively the user can provide a focused Agent generated by TL. The output is a focused generative model and "scaffold_memory.csv" file which contains the best scoring SMILES during the RL run. The output folder is defined by setting a value for "resultdir". The scoring function object "scoring_function" can be either "name": "custom_product" or "name": "custom_sum". Scoring function has a list of parameters "parameters":[] which may contain any number of component objects. The current template example offers 5 components: a QED score, Matching Substructure (MS), Custom Alerts (CA) and 2 Predictive Property (PP) components. The PP components require setting either a classification ("reinvent/data/drd2.pkl") or regression ("reinvent/data/Aurora_model.pkl") model paths.

Available components

The scoring function is built-up from components, which together define the "compass" the Agents use to navigate the chemical space and suggest chemical compounds. Currently, there are the following components available:


To use tensorboard for logging:

  1. To launch tensorboard, you need a graphical environment. Write: tensorboard --logdir "path to your log output directory" --port=8008 This will give you an address to copy to a browser and access to the graphical summaries from tensorboard.

  2. Further command-line parameters can be used to change the amount of scalars, histograms, images, distributions and graphs shown, e.g.: --samples_per_plugin=scalar=700, images=20

Installation

  1. Install Anaconda / Miniconda

  2. Clone the repository

  3. Open terminal, go to the repository and generate the appropriate environment: conda env create -f reinvent_shared.yml

  4. (Optional) To set environmental variables (currently not needed), for example a license: On the command line, first:

    cd $CONDA_PREFIX
    mkdir -p ./etc/conda/activate.d
    mkdir -p ./etc/conda/deactivate.d
    touch ./etc/conda/activate.d/env_vars.sh
    touch ./etc/conda/deactivate.d/env_vars.sh
    

    then edit ./etc/conda/activate.d/env_vars.sh as follows:

    #!/bin/sh
    export SOME_LICENSE='/path/to/your/license/file'
    

    and finally, edit ./etc/conda/deactivate.d/env_vars.sh :

    #!/bin/sh
    unset SOME_LICENSE
    
  5. Activate environment: conda activate reinvent_shared.v2.1

  6. (Optional) In the project directory, in ./configs/ create the file config.json by copying over example.config.json and editing as required. In the current version this is only relevant for the unit tests.

  7. Use the tool.