Home

Awesome

Retrosynthetic Accessibility (RA) score

alt text

Known Issues

Installation

Follow the steps in the defined order to avoid conflicts.

  1. Create an environment (note: the python version must be == 3.7, 3.8):
conda create --name myenv python=3.7
conda activate myenv

or use an existing environment with python >= 3.7

  1. Install rdkit 2020.03 (if already installed skip this step)
conda install -c rdkit rdkit -y
  1. Install RAscore
pip install git+https://github.com/reymond-group/RAscore.git@master

The models.zip file will have to be downloaded and after unzipping the path to the relevant model will need to be specified on model instantiation.

or

Clone and install the repository using (models should be included):

git clone https://github.com/reymond-group/RAscore.git
pip install --editable .

If the models are not automatically included check that the models.zip file exists and unzip it into the desired location.

Known Installation Issues

The following versions must be used in order to use the pretrained models:

These requirements arise becuase of the pickling method used to save the model and compatibility issues arising between different versions.

Usage

Importing in Python

Depending on if you would like to use the XGB based or Tensorflow based models you can import different modules.

To walk through the example in a jupyter notebook refer to rascore_usage.ipynb

from RAscore import RAscore_NN #For tensorflow and keras based models
from RAscore import RAscore_XGB #For XGB based models

nn_scorer = RAscore_NN.RAScorerNN() 
xgb_scorer = RAscore_XGB.RAScorerXGB()

#Imatinib mesylate
imatinib_mesylate = 'CC1=C(C=C(C=C1)NC(=O)C2=CC=C(C=C2)CN3CCN(CC3)C)NC4=NC=CC(=N4)C5=CN=CC=C5.CS(=O)(=O)O'
nn_scorer.predict(imatinib_mesylate)
0.99522984

xgb_scorer.predict(imatinib_mesylate)
0.99259007

#Omeprazole
omeprazole = 'CC1=CN=C(C(=C1OC)C)CS(=O)C2=NC3=C(N2)C=C(C=C3)OC'
nn_scorer.predict(omeprazole)
0.99999106

xgb_scorer.predict(omeprazole)
0.9556329

#Morphine - Illustrates problem synthesis planning tools face with complex ring systems
morphine = 'CN1CC[C@]23c4c5ccc(O)c4O[C@H]2[C@@H](O)C=C[C@H]3[C@H]1C5'
nn_scorer.predict(morphine)
8.316945e-07

xgb_scorer.predict(morphine)
0.0028359715

Command Line Interface

A command line interface is provided which allows batch processing and enables the flexibility of specifying models.\

Usage: RAscore [OPTIONS]

Example: A set of smiles `test.smi` are provided

`RAscore -f test.smi -c SMILES -o test.csv`

Default Model: XGBoost using ChEMBL and ECFP4 counts with features

Options:
  -f, --file_path TEXT      Absolute path to input file containing one SMILES
                            on each line. The column should be labelled
                            "SMILES" or if another header is used, specify it
                            as an option

  -c, --column_header TEXT  The name given to the singular column in the file
                            which contains the SMILES. The column must be
                            named.

  -o, --output_path TEXT    Output file path
  -m, --model_path TEXT     Absolute path to the model to use, if .h5 file
                            neural network in tensorflow/keras, if .pkl then
                            XGBoost

  --help                    Show this message and exit.

Further RAscore models are contained in the RAscore/models/models.zip folder if you wish to specify a different model than the default:

Retraining

If you want to retrain models, or train your own models using the hyperparameter optimisation framework found in the 'model_building' folder, then the following should be installed in the environemnt aswell:
pip install -e .[retraining]

The SYBA, SCscore and SAscore should also be downloaded for descriptor calculations and training scripts modified to reflect the locations of the models:

Please refer to the model_building folder for further information about retraining.

Performance on Test Set

Computation of Average Linkage

alt text

Citation

The models have been published in Chemical Science

Thakkar, A.; Chadimová, V.; Bjerrum, E. J.; Engkvist, O.; Reymond, J.-L. Retrosynthetic Accessibility Score (RAscore) – Rapid Machine Learned Synthesizability Classification from AI Driven Retrosynthetic Planning. Chem. Sci. 2021. https://doi.org/10.1039/d0sc05401a