Home

Awesome

Profile-QSAR

The files in this repo:

ChEMBL Data retrival

Key features

Usage example

   $ bash chembl_sqlite.sh

Build profile-QSAR models

Key features

Usage example

   $ bash molproc_pqsar.sh

Look up profile-QSAR pre-calculations

Key features

Usage example

grepMOA2.py arguments:

  -h, --help            show this help message and exit
  -i , --Input          Input file with header line followed by querying compounds' ID and/or assays' AIDs
  -z True/False, --ZpIC50 True/False
                        True (default): Threshold and predictions in Z-scaled values; False: original pIC50 (log molar).
  -t , --Threshold      Threshold to filter out unqualified screening
  -o , --Output         Output file in csv format
  -c, --Compound        Querying by compounds (one compound per row)
  -a, --Assay           Querying by assays (one assay per row)
  -ca, --CA             Querying by compounds(first column) and assays (second column)
python grepMOA2.py -c -i cid.txt -d ../chembl_28 -t 3.0 -z True -o cid_out.csv
python grepMOA2.py -a -i aid.txt -d ../chembl_28 -t 3.0 -z True -o aid_out.csv
python grepMOA2.py -ca -i ca_id.txt -d ../chembl_28 -z True -o ca_id_out.csv

FAQ

what is the requirement to run this code?

It requirs a high performance computing cluster to run the code. It has been tested on Novartis High performance Computing cluster using python version 3.8.6 and bash shell. Important packages are in requirements.txt.

Where to edit the code to adapt it to a different cluster?

The python code submits jobs to clusters and gets job ID and status using commands like qsub, subprocess, and re. Users might need to tweak those lines to accommodate the code to their own clusters.

Do you fine-tune parameters for larger datasets?

The number of jobs, memoery allocation, R2 cutoff (0.05, 0.20) for max2 have been tuned based on serveral dataset: (1) Novartis (~12K assays and 5.5M compounds) and (2) ChEMBL (a total of 4276 assays and 1.4M compounds) data (refering to J Chem Inf Model, 2019). Those same set of paramers also worked well on the example kinase dataset. It might still need to be fine-tuned depending on your own datasets and clusters.

Notes and Acknowledgments

@Valery Polyakov, @Li Tian, @Brian kelly...

How do I cite profile-QSAR:

@article{Martin2021,
annote = {doi: 10.1021/acs.jcim.0c01342},
author = {Martin, Eric J and Zhu, Xiang-Wei},
doi = {10.1021/acs.jcim.0c01342},
issn = {1549-9596},
journal = {Journal of Chemical Information and Modeling},
month = {apr},
publisher = {American Chemical Society},
title = {{Collaborative Profile-QSAR: A Natural Platform for Building Collaborative Models among Competing Companies}},
url = {https://pubs.acs.org/doi/abs/10.1021/acs.jcim.0c01342},
year = {2021}
}

@article{Martin2019,
author = {Martin, Eric J and Polyakov, Valery R and Zhu, Xiang-Wei and Tian, Li and Mukherjee, Prasenjit and Liu, Xin},
doi = {10.1021/acs.jcim.9b00375},
issn = {1549-960X (Electronic)},
journal = {Journal of chemical information and modeling},
language = {eng},
month = {sep},
number = {10},
pages = {4450--4459},
pmid = {31518124},
title = {{All-Assay-Max2 pQSAR: Activity Predictions as Accurate as Four-Concentration IC50s for 8558 Novartis Assays}},
volume = {59},
year = {2019}
}

@article{Martin2017,
author = {Martin, Eric J. and Polyakov, Valery R. and Tian, Li and Perez, Rolando C.},
doi = {10.1021/acs.jcim.7b00166},
issn = {15205142},
journal = {Journal of Chemical Information and Modeling},
number = {8},
pages = {2077--2088},
title = {{Profile-QSAR 2.0: Kinase Virtual Screening Accuracy Comparable to Four-Concentration IC50s for Realistically Novel Compounds}},
volume = {57},
year = {2017}
}

@article{Martin2011,
author = {Martin, Eric and Mukherjee, Prasenjit and Sullivan, David and Jansen, Johanna},
doi = {10.1021/ci1005004},
isbn = {1549-960X (Electronic)\r1549-9596 (Linking)},
issn = {15499596},
journal = {Journal of Chemical Information and Modeling},
number = {8},
pages = {1942--1956},
pmid = {21667971},
title = {{Profile-QSAR: A novel meta-QSAR method that combines activities across the kinase family to accurately predict affinity, selectivity, and cellular activity}},
volume = {51},
year = {2011}
}

Contact Information

For help or issues using profile-QSAR, please submit a GitHub issue.

For personal communication related to this package, please contact Eric Matin (eric.martin@novartis.com) and Xiangwei Zhu (xwzhunc@gmail.com).