Home

Awesome

DOI

Human-in-the-loop Active Learning for Goal-Oriented Molecule Generation

We present an interactive workflow to fine-tune predictive machine learning models of target molecular properties based on expert feedback, and foster human-machine collaboration for goal-oriented molecular design and optimization.

Overview of the human-in-the-loop active learning workflow to fine-tune molecular property predictors for goal-oriented molecule generation.

In this study, we simulate the process of producing novel drug candidates through machine learning (REINVENT) then validating them in the lab. This workflow is based REINVENT 3.2 for molecule generation. In the meantime, REINVENT 4 was released so the plan is to move to REINVENT 4 soon!

The goal of this study is to generate successful top-scoring molecules (i.e., promising with respect to a target molecular property) according to both the machine learning predictive model used in the scoring function, and a lab simulator that validates the promise of the produced molecules at the end of the REINVENT process. Both should be well aligned to avoid relying on suboptimal molecules during assay trials and increasing their success rate.

Since simulators are expensive to query at each iteration of fine-tuning the predictive model (i.e., active learning), we mitigate this by allowing "weaker" yet more accessible oracles (i.e., human experts) to be queried for iterative fine-tuning of the predictive model (i.e., human-in-the-loop active learning). The lab simulator is then only used at the end of the REINVENT process for final validation.

Human experts evaluate the relevance of top-scoring molecules identified by the predictive machine learning model by accepting or refuting some of them. Our results demonstrated significant improvements in the REINVENT process' outcome where the predicted success scores of the final generated molecules are better aligned with those of the lab simulator, while enhancing other metrics such as drug-likeness and synthetic accessibility.

System Requirements

Installation

  1. Since this workflow is based on REINVENT 3.2, you need a working installation of REINVENT 3.2. Follow install instructions here.

  2. Create a virtual environment with python>=3.9,<3.11 and activate it, then install the package with

     pip install hitl-al-gomg
    

Usage

Below are command examples to train a target property predictor then running the active learning workflow using a simulated expert to fine-tune it. Make sure to replace the provided paths with yours before running the command lines. In this example, the target property is DRD2 bioactivity.

For training a predictor of DRD2 bioactivity:

    python -m hitl_al_gomg.models.train --path_to_train_data data/train/drd2_train --path_to_test_data data/test/drd2_test --path_to_predictor data/predictors/drd2 --path_to_simulator data/simulators/drd2 --train True --demo True

For running the HITL-AL workflow using a simulated expert:

Once you have a pre-trained predictor for your target property, you can use it to run REINVENT to produce novel molecules that satisfy this property.

For calculating simulator scores and metrics from MOSES:

Once you the HITL-AL run is completed, you can generate a pickled dictionary that contains simulator/oracle scores and metrics to evaluate your generated molecules at the end of each round and track the progress of your predictor fine-tuning.

For running the HITL-AL workflow using the Metis graphical interface:

To run the workflow with real expert feedback through a graphical interface, you first need to install Metis in two quick steps:

  1. Clone the Metis repository using git clone --branch nahal_experiment https://github.com/JanoschMenke/metis.git then navigate to its location.
  2. On a remote machine accessible through SSH and that has SLURM, install REINVENT V3.2 as mentioned previously.

To run the HITL-AL workflow described in our paper, you can download the following zipped folder and upload it to your remote machine. This folder contains the models used for Reinvent (the prior Reinvent agent random.prior.new, the Reinvent agent Agent_Initial.ckpt after being optimized for 1200 epochs using the initial target property predictor Model_Initial.pkl as well as the hERG bioactivity oracle that we use in the multi-objective use case experiments herg.pkl).

You should change the following file contents according to your remote SSH login details and your paths to predictive models and data sets.

To start the interface and the human workflow, run

    cd metis && python metis.py

Your evaluations through Metis will be stored in the results folder.

Data

Notebooks

In notebooks/, we provide Jupyter notebooks with code to reproduce the paper's result figures for both simulation and real human experiments.

Acknowledgements

For any inquiries, please contact yasmine.nahal@aalto.fi