Home

Awesome

<h1 align="center">Organizing The Chemical Universe</h1>

Global Chem is a public dictionary of common chemical lists using the Common Chemical Name as input and SMILES/SMARTS as output organized by their respective community in a knowledge graph.

Global-Chem serves as an open source platform where writing the molecules directly allows for the name to not be ambiguous to what a chemical is is anymore and allows for full transparency.

Our hope is this repository serves as a base for the population to govern how the chemicals we use in things like Food, Clothing, Environment, Materials, Drugs, War and a lot more are beneficial for all of us.

<p align="center"> <img width="800" alt="Screen Shot 2022-07-16 at 5 29 41 PM" src="https://user-images.githubusercontent.com/11812946/179372564-c286b115-af14-4ad8-a37f-0a216297b6c1.png"> </p>

Just with no dependencies, intialize the class and there you go! All the common and rare groups of the world at your disposal.

Overview

GlobalChem

- Introduction

DemoDemo
DocsDocumentation
ChatDiscord

- Validation

saythankssaythankssaythankssaythankssaythankssaythanks

- Statistics

Downloadsvisitor badgeMan HoursRepo Size

- Github Actions

Test Systempre-commit.ci statuspublishTranslate READMEAudit GlobalChem Web Performance

- Public Notifications

Bot Checker for Essential Medicines ListBot Checker for FDA Colour Additive List

- Build Information

PyPI versionCoverage StatusRepo SizeDOIFOSSA StatusPEP8Maturity badge - level 2Repo StatusLicense: MPL 2.0

Regulatory Compliance

CII Best Practicesfair-software.eu

- GlobalChemExtensions

PyPI versionLicense: MPL 2.0Downloads

Extension Demo Reel

ApplicationIntroductionAdvanced Usage
forcefieldsDemoCGenFF Molecule Loader and Atom Type SimilarityDemo
bioinformaticsDemoUse the Bostrom Algorithm to Filter Ligands By PDBDemo
cheminformaticsDemoPrincipal Component Analysis of Common Universe & Visualizing Common Molecule Scaffold Graphs & WordClouds of Tainted ProductsDemo
quantum_chemistryDemoPlot Quantum Theory and Basis Set versuses the Hamiltonian of small moleculesDemo
development_operationsDemo

Installation

GlobalChem is going to be distribute via PyPi and as the tree and it's extensions grows we can expand it to other pieces of software making it accessible to all regardless of what you use. Alternatively, you could have a glance at the source code and copy/paste it yourself.


pip install global-chem

QuickStart

Here we load the global-chem[cheminformatics] extensions package and the GlobalChem tree. We extract SMILES from the popular book, Pihkal, and perform cheminformatic principal component analysis on the chemical list.


from global_chem import GlobalChem

gc = GlobalChem()

gc.build_global_chem_network()
smiles_list = list(gc.get_node_smiles('pihkal').values())

print (f"SMILES: {smiles_list[0]}")

GlobalChem

Nodes Contributors

Please follow the node contribution guidelines if you would like to elect your own or someone elses.

'global_chem': Node,                                                      # Suliman Sharif
'emerging_perfluoroalkyls': EmergingPerFluoroAlkyls,                      # Asuka Orr & Suliman Sharif
'montmorillonite_adsorption': MontmorilloniteAdsorption,                  # Asuka Orr & Suliman Sharif
'common_monomer_repeating_units': CommonMonomerRepeatingUnits,            # Suliman Sharif
'electrophilic_warheads_for_kinases': ElectrophilicWarheadsForKinases,    # Ruibin Liu & Suliman Sharif
'common_warheads_covalent_inhibitors': CommonWarheadsCovalentInhibitors,  # Shaoqi Zhan & Suliman Sharif
'rings_in_drugs': RingsInDrugs,                                           # Alexander Mackerell & Suliman Sharif
'iupac_blue_book_rings': IUPACBlueBookRings,                              # Suliman Sharif
'phase_2_hetereocyclic_rings': Phase2HetereoCyclicRings,                  # Suliman Sharif
'privileged_scaffolds': PrivilegedScaffolds,                              # Suliman Sharif
'iupac_blue_book': IUPACBlueBook,                                         # Suliman Sharif
'common_r_group_replacements': CommonRGroupReplacements,                  # Sunhwan Jo & Suliman Sharif
'braf_inhibitors': BRAFInhibitors,                                        # Aarion Romany & Suliman Sharif
'privileged_kinase_inhibitors': PrivilegedKinaseInhibitors,               # Suliman Sharif
'common_organic_solvents': CommonOrganicSolvents,                         # Suliman Sharif
'amino_acid_protecting_groups': AminoAcidProtectingGroups,                # Aziza Frank & Suliman Sharif
'schedule_one': ScheduleOne,                                              # Suliman Sharif
'schedule_two': ScheduleTwo,                                              # Suliman Sharif
'schedule_three': ScheduleThree,                                          # Suliman Sharif
'schedule_four': ScheduleFour,                                            # Suliman Sharif
'schedule_five': ScheduleFive,                                            # Suliman Sharif
'interstellar_space': InterstellarSpace,                                  # Suliman Sharif
'vitamins': Vitamins,                                                     # Suliman Sharif
'open_smiles': OpenSmiles,                                                # Suliman Sharif
'amino_acids': AminoAcids,                                                # Suliman Sharif
'pihkal': Pihkal,                                                         # Suliman Sharif
'nickel_ligands': NickelBidendatePhosphineLigands,                        # Suliman Sharif
'cimetidine_and_acyclovir': CimetidineAndAcyclovir,                       # Suliman Sharif
'common_regex_patterns': CommonRegexPatterns,                             # Chris Burke & Suliman Sharif
'how_to_live_longer': HowToLiveLonger,                                    # Suliman Sharif
'monoclonal_antibodies': MonoclonalAntibodies,                            # Asuka Orr & Suliman Sharif
'lube': Lube,                                                             # Daniel Khavrutskii & Suliman Sharif
'tainted_sexual_enhancements': TaintedSexualEnhancements,                 # Suliman Sharif
'exsens_products': ExsensProducts,                                        # Rebecca Pinette-Dorin & Suliman Sharif
'fda_list_one': FDAListOne,                                               # Mike Wostner & Suliman Sharif
'fda_list_two': FDAListTwo,                                               # Mike Wostner & Suliman Sharif
'fda_list_three': FDAListThree,                                           # Mike Wostner & Suliman Sharif
'fda_list_four': FDAListFour,                                             # Mike Wostner & Suliman Sharif
'fda_list_five': FDAListFive,                                             # Mike Wostner & Suliman Sharif
'fda_list_six': FDAListSix,                                               # Mike Wostner & Suliman Sharif
'fda_list_seven': FDAListSeven,                                           # Mike Wostner & Suliman Sharif
'constituents_of_cannabis_sativa': ConstituentsOfCannabisSativa,          # Ian Jones & Bettina Lier & Suliman Sharif
'phytocannabinoids': PhytoCannabinoids,                                   # Ian Jones & Bettina Lier & Suliman Sharif
'organophosphorous_nerve_agents': OrganoPhosphorousNerveAgents,           # Suliman Sharif
'organic_and_inorganic_bronsted_acids': OrganicAndInorganicBronstedAcids, # Nathaniel McClean & Suliman Sharif
'chemicals_from_biomass': ChemicalsFromBioMass,                           # Anthony Maiorana & Suliman Sharif 
'salt': Salt,                                                             # Suliman Sharif
'drugs_from_snake_venom': DrugsFromSnakeVenom,                            # Suliman Sharif
'oral_contraceptives': OralContraceptives,                                # Suliman Sharif
'surfactants': Surfactants,                                               # Yiling Nan & Suliman Sharif
'lanthipeptides: LanthiPeptides                                           # Prabin Baral & Suliman Sharif
'alternative_jet_fuels': AlternativeJetFuels                              # Suliman Sharif
Chemical List# of EntriesReferences
Amino Acids20Common Knowledge
Essential Vitamins13Common Knowledge
Common Organic Solvents42Fulmer, Gregory R., et al. “NMR Chemical Shifts of Trace Impurities: Common Laboratory Solvents, Organics, and Gases in Deuterated Solvents Relevant to the Organometallic Chemist.”Organometallics, vol. 29, no. 9, May 2010, pp. 2176–79.
Open Smiles94OpenSMILES Home Page. http://opensmiles.org/.
IUPAC Blue Book (CRC Handbook) 2003333Chemical Rubber Company. CRC Handbook of Chemistry and Physics: A Ready-Reference Book of Chemical and Physical Data Edited by David R. Lide, 85. ed, CRC Press, 2004.
Rings in Drugs92Taylor, Richard D., et al. “Rings in Drugs.” Journal of Medicinal Chemistry, vol. 57, no. 14, July 2014, pp. 5845–59. ACS Publications, https://doi.org/10.1021/jm4017625.
Phase 2 Hetereocyclic Rings19Broughton, Howard B., and Ian A. Watson. “Selection of Heterocycles for Drug Design.” Journal of Molecular Graphics & Modelling, vol. 23, no. 1, Sept. 2004, pp. 51–58. PubMed, https://doi.org/10.1016/j.jmgm.2004.03.016.
Privileged Scaffolds47Welsch, Matthew E., et al. “Privileged Scaffolds for Library Design and Drug Discovery.” Current Opinion in Chemical Biology , vol. 14, no. 3, June 2010, pp. 347–61.PubMed, https://doi.org/10.1016/j.cbpa.2010.02.018.
Common Warheads29Gehringer, Matthias, and Stefan A. Laufer. “Emerging and Re-Emerging Warheads for Targeted Covalent Inhibitors: Applications in Medicinal Chemistry and Chemical Biology.”Journal of Medicinal Chemistry , vol. 62, no. 12, June 2019, pp. 5673–724. ACS Publications, https://doi.org/10.1021/acs.jmedchem.8b01153.
Common Polymer Repeating Units78Hiorns, R. C., et al. “A brief guide to polymer nomenclature (IUPAC Technical Report).”Pure and Applied Chemistry , vol. 84, no. 10, Oct. 2012, pp. 2167–69., https://doi.org/10.1351/PAC-REP-12-03-05.
Common R Group Replacements499Takeuchi, Kosuke, et al. “R-Group Replacement Database for Medicinal Chemistry.” Future Science OA , vol. 7, no. 8, Sept. 2021, p. FSO742. future-science.com (Atypon) , https://doi.org/10.2144/fsoa-2021-0062.
Electrophillic Warheads for Kinases24Petri, László, et al. “An Electrophilic Warhead Library for Mapping the Reactivity and Accessibility of Tractable Cysteines in Protein Kinases.” European Journal of Medicinal Chemistry, vol. 207, Dec. 2020, p. 112836. PubMed, https://doi.org/10.1016/j.ejmech.2020.112836.
Privileged Scaffolds for Kinases29Hu, Huabin, et al. “Systematic Comparison of Competitive and Allosteric Kinase Inhibitors Reveals Common Structural Characteristics.” European Journal of Medicinal Chemistry, vol. 214, Mar. 2021, p. 113206. ScienceDirect, https://doi.org/10.1016/j.ejmech.2021.113206.
BRaf Inhibitors54Agianian, Bogos, and Evripidis Gavathiotis. “Current Insights of BRAF Inhibitors in Cancer.” Journal of Medicinal Chemistry, vol. 61, no. 14, July 2018, pp. 5775–93. ACS Publications, https://doi.org/10.1021/acs.jmedchem.7b01306.
Common Amino Acid Protecting Groups346Isidro-Llobet, Albert, et al. “Amino Acid-Protecting Groups.” Chemical Reviews, vol. 109, no. 6, June 2009, pp. 2455–504. DOI.org (Crossref), https://doi.org/10.1021/cr800323s.
Emerging Perfluoroalkyls27Pelch, Katherine E., et al. “PFAS Health Effects Database: Protocol for a Systematic Evidence Map.” Environment International, vol. 130, Sept. 2019, p. 104851. ScienceDirect, https://doi.org/10.1016/j.envint.2019.05.045.
Chemicals For Clay Adsorption33Orr, Asuka A., et al. “Combining Experimental Isotherms, Minimalistic Simulations, and a Model to Understand and Predict Chemical Adsorption onto Montmorillonite Clays.” ACS Omega, vol. 6, no. 22, June 2021, pp. 14090–103. PubMed, https://doi.org/10.1021/acsomega.1c00481.
Schedule 1 United States Narcotics240ECFR :: 21 CFR Part 1308 - Schedules.
Schedule 2 United States Narcotics60ECFR :: 21 CFR Part 1308 - Schedules.
Schedule 3 United States Narcotics22ECFR :: 21 CFR Part 1308 - Schedules.
Schedule 4 United States Narcotics77ECFR :: 21 CFR Part 1308 - Schedules.
Schedule 5 United States Narcotics8ECFR :: 21 CFR Part 1308 - Schedules.
Pihkal179Shulgin, Alexander T., and Ann Shulgin. Pihkal: A Chemical Love Story. 1. ed., 8. print, Transform, 2010.
Excipients Cimetidine & Acyclovir14Vaithianathan, Soundarya, et al. “Effect of Common Excipients on the Oral Drug Absorption of Biopharmaceutics Classification System Class 3 Drugs Cimetidine and Acyclovir.” Journal of Pharmaceutical Sciences, vol. 105, no. 2, Feb. 2016, pp. 996–1005. PubMed, https://doi.org/10.1002/jps.24643.
Nickel Bidendate Phosphine LigandsN/AClevenger, Andrew L., et al. “Trends in the Usage of Bidentate Phosphines as Ligands in Nickel Catalysis.” Chemical Reviews, vol. 120, no. 13, July 2020, pp. 6124–96. DOI.org (Crossref), https://doi.org/10.1021/acs.chemrev.9b00682.
HowToLiveLonger4https://github.com/geekan/HowToLiveLonger
Monoclonal Antibodies19https://labels.fda.gov/
Common Lubricants for Sex38https://exsens-usa.com/blogs/your-body-your-pleasure/lube-lessons-glossary-of-common-sex-lube-ingredients
Tainted Sexual Enhancements4FDA Tainted Sexual Enhancements
Salt14OpenFoodFacts https://github.com/openfoodfacts
Exsens Sexual Wellness59https://exsens-usa.com/
FDA Color Additive List 112https://www.fda.gov/industry/color-additive-inventories/color-additive-status-list
FDA Color Additive List 215https://www.fda.gov/industry/color-additive-inventories/color-additive-status-list
FDA Color Additive List 316https://www.fda.gov/industry/color-additive-inventories/color-additive-status-list
FDA Color Additive List 439https://www.fda.gov/industry/color-additive-inventories/color-additive-status-list
FDA Color Additive List 527https://www.fda.gov/industry/color-additive-inventories/color-additive-status-list
FDA Color Additive List 629https://www.fda.gov/industry/color-additive-inventories/color-additive-status-list
FDA Color Additive List 737https://www.fda.gov/industry/color-additive-inventories/color-additive-status-list
Constituents of Cannabis Sativa394Turner, C. E., et al. “Constituents of Cannabis Sativa L. XVII. A Review of the Natural Constituents.” Journal of Natural Products, vol. 43, no. 2, Apr. 1980, pp. 169–234. PubMed
Phytocannabinoids111Hanuš, Lumír Ondřej, et al. “Phytocannabinoids: A Unified Critical Inventory.” Natural Product Reports, vol. 33, no. 12, Nov. 2016, pp. 1357–92. PubMed,
OrganoPhosphorous Nerve Agents14Mukherjee, Sudisha, and Rinkoo Devi Gupta. “Organophosphorus Nerve Agents: Types, Toxicity, and Treatments.” Journal of Toxicology, vol. 2020, Sept. 2020, p. 3007984.
Cengage Bronsted Acids42https://cxp.cengage.com/contentservice/assets/owms01h/references/chemtables/org_chem/pKaTable.html
Chemicals From Biomass17Wittcoff, Harold A., et al. Industrial Organic Chemicals: Wittcoff/Organic Chemicals. John Wiley & Sons, Inc., 2004
Drugs From Snake Venom7Oliveira, Ana L., et al. “The Chemistry of Snake Venom and Its Medicinal Potential.” Nature Reviews Chemistry, vol. 6, no. 7, July 2022, pp. 451–69
Oral Contraceptives17Coleman, William F. “The Molecules of Oral Contraceptives.” Journal of Chemical Education, vol. 87, no. 7, July 2010, pp. 760–61.
Surfactants for Skin36Date, Abhijit A., and Vandana B. Patravale. “Microemulsions: Applications in Transdermal and Dermal Delivery.” Critical Reviews™ in Therapeutic Drug Carrier Systems, vol. 24, no. 6, 2007.
LanthiPeptides2Pokhrel, Rudramani, et al. “Molecular Mechanisms of Pore Formation and Membrane Disruption by the Antimicrobial Lantibiotic Peptide Mutacin 1140.” Physical Chemistry Chemical Physics, vol. 21, no. 23, June 2019, pp. 12530–39.
Alternative Jet Fuels59Chemical Composition and Fuel Properties of Alternative Jet Fuels :: BioResources. https://bioresources.cnr.ncsu.edu/.
Common Regex Patterns1

GlobalChemExtensions

Installation

GlobalChemExtensions is going to be distribute via PyPi as saperate modules and as the tree and it's extensions grows we can expand it to other pieces of software making it accessible to all regardless of what you use. Alternatively, you could have a glance at the source code and copy/paste it yourself.


pip install 'global-chem[graphing]'
pip install 'global-chem[forcefields]'
pip install 'global-chem[bioinformatics]'
pip install 'global-chem[cheminformatics]'
pip install 'global-chem[quantum_chemistry]'
pip install 'global-chem[development_operations]'
pip install 'global-chem[all]'

Quickstart

To conduct PCA Analysis on a list of SMILES in the network:


from global_chem import GlobalChem
from global_chem_extensions import GlobalChemExtensions

gc = GlobalChem()
gc_cheminfo = GlobalChemExtensions().cheminformatics()

gc.build_global_chem_network()
smiles_list = list(gc.get_node_smiles('pihkal').values())

print (f"SMILES: {smiles_list[0]}")

gc_cheminfo.node_pca_analysis(smiles_list)

A Variety of Tools are available for you to browse and analyze data and with the full list of different applications can be found in the google colab demo or the Gitbook documentation. A demonstration of the data visualization extensions designed with plotly and bokeh are displayed below:

<p align="center"> <img width="800" height="600" src="https://raw.githubusercontent.com/Sulstice/global-chem/master/images/figures/figure_10.png"> </p> <details><summary><h3>Extension List</h1><br/></summary>
ExtensionDescriptionAppplication
GlobalChem Chemical EntitiesGlobalChem has internal Molecule objects with all common attributes associated and conversion to SMILESforcefields
GlobalChem Biological EntitiesGlobalChem has internal DNA/RNA/Protein/Molecule objects with all common attributes associated and conversion to SMILESbioinformatics
Visualize DNA/RNA StrandsVisualize DNA and RNA Strands and add labels to thembioinformatics
ForceField MoleculesGlobalChem can parse, manipulate, and write CGenFF and GaFF2 files as objectsforcefields
PDF Generation and ParsingGlobalChem can generate SMILES to PDF and convert the PDF to SMILEScheminformatics
SMILES ValidationGlobalChem has connection to PySMILES, DeepSMILES, PartialSmiles, SELFIES, MolVS for validation of SMILES setscheminformatics
SMILES Protonation StatesGlobalChem can take a set of compounds and predict the protonation states of a SMILES string over a range of pHchemfinformatics
Open Source Database MonitoringGlobalChem uses Uptime-Cheminformatics to Keep Track of Open Source Chemical Datadevelopment_operations
Networkx Software AdapterGlobalChem Network can be converted into NetworkX Graph Objectscheminformatics
SMARTS Pattern ValidationGlobalChem uses the MiniFrag Database to test SMARTS strings accuracy for functional group selectioncheminformatics
Principal Component AnalysisGlobalChem can readily interpret SMILES, fingerprint, cluster and apply PCA analysis user can tweak parameterscheminformatics
Drug Design FiltersGlobalChem can filter compounds based on Common Drug Design Filtering Rulescheminformatics
Deep Layer Scatter AnalysisTo visualize relations between sets of molecules, GlobalChem offers a parallel coordinate diagram generationcheminformatics
Sunbursting Radial AnalysisGlobalChem offers a sunbursting mechanism to allow uses to observe how sets of compounds relate to the common setcheminformatics
Graphing TemplatesGlobalChem offers graphing templates to aid in faster data analysis, currently the only offer is Plotlycheminformatics
CGenFF Dissimilarity ScoreGlobalChem can offer the difference between two molecules based on their Atom Typesforcefields
OneHot EncodingGlobalChem has it's own one hot encoder and decoder based on the common lists for Machine Learningcheminformatics
SMARTS Pattern IdentifierGlobalChem connects to the SMARTS Plus and can offer visualization into different SMARTS componentscheminformatics
Psi4 ParserOffer parsing of Psi4 Output Files and extracting valuesquantum_chemistry
Coordinate StoreA warehouse for coodinates of small molecules for distribution in xyz and zm-matrixquantum_chemistry
Visualize Molecular OrbitalsVisualize the Cube Files from Psi4 Output cubepropquantum_chemistry
</details>

Open Source Software Compliance

GlobalChem follows the same principles outlined in part 11 of Title 21 of the Code of Federal Regulations; Electronic Records, Electronic Signatures (21 CFR Part 11) guidance documentation. Since there are no formal guidelines for how open source software should be handled, we attempt at completing requirements. The FDA considers part 11 to be applicable to the following criteria of electronic records and how GlobalChem accomplishes each component:

GlobalChem has a Mozilla Public License version 2.0. GlobalChem allows you to use the software in your larger work and extend it with modifications if you wish. The contingency is that if you install GlobalChem and release new software then you must follow the same principles installed in our license for the open source community.

Data Collection

References and associatied compound lists are selected based on the interests of the scientific contributors. This should include consideration of relevance to the scientific community. The SMILES strings may be abstracted in a variety of methods:


Licensing

FOSSA Status