Awesome
MolScore: A scoring, evaluation and benchmarking framework for de novo drug design
Overview
MolScore contains code to score de novo compounds in the context of generative de novo design by generative models via the subpackage molscore
, as well as, facilitate downstream evaluation via the subpackage moleval
. An objective is defined via a JSON file which can be shared to propose new multi-parameter objectives for drug design. MolScore can be used in several ways:
- To implement a multi-parameter objective to for prospective drug design.
- To benchmark objectives/generative models/optimization using benchmark mode (MolScoreBenchmark).
- To implement a sequence of objectives using curriculum mode (MolScoreCurriculum).
Generative models with MolScore already integrated can be found here.
Contributions and/or ideas for added functionality are welcomed!
Installation
MolScore can be installed by cloning this repository and setting up an environment using your favourite environment manager (I recommend mamba).
git clone https://github.com/MorganCThomas/MolScore.git
cd MolScore
mamba env create -f environment.yml
mamba activate molscore
pip install ./
Note: You can use pip install -e ./
to install in develop mode.
Alternatively, MolScore is available via the Python Package Index.
pip install molscore --upgrade
Installation time: Installation of molscore in the environment should complete in less than 5 minutes (tested using mamba).
Functionality
Scoring functionality present in molscore, some scoring functions require external softwares and necessary licenses.
Type | Method |
---|---|
Docking | Glide, Smina, OpenEye, GOLD, PLANTS, rDock, Vina, Gnina |
Ligand preparation | RDKit->Epik, Moka->Corina, Ligprep, Gypsum-DL |
3D Similarity | ROCS, Open3DAlign |
2D Similarity | Fingerprint similarity (any RDKit fingerprint and similarity measure), substructure match/filter, Applicability domain |
Predictive models | Scikit-learn (classification/regression), PIDGINv5<sup>a</sup>, ChemProp, ADMET-AI |
Synthesizability | RAscore, AiZynthFinder, SAscore, ReactionFilters (Scaffold decoration) |
Descriptors | RDKit, Maximum consecutive rotatable bonds, Penalized LogP, LinkerDescriptors (Fragment linking), MolSkill |
Transformation methods | Linear, linear threshold, step threshold, Gaussian |
Aggregation methods | Arithmetic mean, geometric mean, weighted sum, product, weighted product, auto-weighted sum/product, pareto front |
Diversity filters | Unique, Occurence, memory assisted + ScaffoldSimilarityECFP |
<sup>a</sup> PIDGINv5 is a suite of pre-trained RF classifiers on ~2,300 ChEMBL31 targets
Performance metrics present in moleval, many of which are from GuacaMol or MOSES.
Type | metric |
---|---|
Intrinsic property | Validity, Uniqueness, Scaffold uniqueness, Internal diversity (1 & 2), Sphere exclusion diversity<sup>b</sup>, Solow Polasky diversity<sup>g</sup>, Scaffold diversity, Functional group diversity<sup>c</sup>, Ring system diversity<sup>c</sup>, Filters (MCF & PAINS), Purchasability<sup>d</sup> |
Extrinsic property<sup>a</sup> | Novelty, FCD, Analogue similarity<sup>e</sup>, Analogue coverage<sup>b</sup>, Functional group similarity, Ring system similarity, Single nearest neighbour similarity, Fragment similarity, Scaffold similarity, Outlier bits (Silliness)<sup>f</sup>, Wasserstein distance (LogP, SA Score, NP score, QED, Weight) |
<sup>a</sup> In reference to a specified external dataset
<sup>b</sup> As in our previous work here
<sup>c</sup> Adaption based on Zhang et al.
<sup>d</sup> Using molbloom
<sup>e</sup> Similar to Blaschke et al.
<sup>f</sup> Based on SillyWalks by Pat Walters
<sup>g</sup> Based on Liu et al.
Usage
For further details, we refer you to the tutorials. Here is a snapshot of using MolScore with the GUIs available.
Here is a GIF demonstrating writing a config file with the help of the GUI, running MolScore in a mock example (scoring randomly sampled SMILES), and monitoring the output with another GUI.
Once molscore
has been implemented into a generative model, the objective needs to be defined! Writing a JSON file is a pain though so instead a streamlit app is provided do help. Simply call molscore_config
from the command line (a simple wrapper to streamlit run molscore/gui/config.py
)
Once the configuration file is saved, simply point to this file path and run de novo molecule optimization. If running with the monitor app you'll be able to investigate molecules as they're being generated. Simply call molscore_monitor
from the command line (a wrapper to streamlit run molscore/gui/monitor.py
).
Citation & Publications
If you use this software, please cite it as below.
@article{thomas2024molscore,
title={MolScore: a scoring, evaluation and benchmarking framework for generative models in de novo drug design},
author={Thomas, Morgan and O’Boyle, Noel M and Bender, Andreas and De Graaf, Chris},
journal={Journal of Cheminformatics},
volume={16},
year={2024},
publisher={BMC}
}
This software was also utilised in the following publications:
- Thomas, M., Smith, R.T., O’Boyle, N.M. et al. Comparison of structure- and ligand-based scoring functions for deep generative models: a GPCR case study. J Cheminform 13, 39 (2021). https://doi.org/10.1186/s13321-021-00516-0
- Thomas M, O'Boyle NM, Bender A, de Graaf C. Augmented Hill-Climb increases reinforcement learning efficiency for language-based de novo molecule generation. J Cheminform 14, 68 (2022). https://doi.org/10.1186/s13321-022-00646-z
- Handa K, Thomas M, Kageyama M, Iijima T, Bender A. On the difficulty of validating molecular generative models realistically: a case study on public and proprietary data. J Cheminform 15, 112 (2023). https://doi.org/10.1186/s13321-023-00781-1
- Thomas M, Ahmad M, Tresadern G, de Fabritiis G. PromptSMILES: Prompting for scaffold decoration and fragment linking in chemical language models. J Cheminform 16, 77 (2024). https://doi.org/10.1186/s13321-024-00866-5
- Bou A, Thomas M, Dittert S, Ramírez CN, Majewski M, Wang Y, Patel S, Tresadern G, Ahmad M, Moens V, Sherman W. ACEGEN: Reinforcement learning of generative chemical agents for drug discovery. J Chem Inf Model 64, 15 (2024). https://doi.org/10.1021/acs.jcim.4c00895
- Thomas M, Matricon PG, Gillespie RJ, Napiórkowska M, Neale H, Mason JS, Brown J, Fieldhouse C, Swain NA, Geng T, O'Boyle NM. Modern hit-finding with structure-guided de novo design: identification of novel nanomolar adenosine A2A receptor ligands using reinforcement learning. ChemRxiv (2024) https://doi.org/10.26434/chemrxiv-2024-wh7zw-v2