Awesome
Awesome data in (experimental) chemistry and materials science
We live in a data-driven age. To make use of all the data that is produced in (experimental) chemistry and materials science, standards for collecting and sharing of data are needed. Once the community agrees on standard schemas, they can be implemented in ELNs and the data can be re-used if shared via repositories.
Symbol | Meaning |
---|---|
๐ด | Currently not developed/maintained |
๐ | Closed source |
๐ | Link to a paper |
Contents
- Electronic lab notebooks (ELN) / Laboratory infrastructure management systems (LIMS)
- Repositories
- Schemas/Ontologies
- Related compilations
Electronic lab notebooks (ELN) / Laboratory infrastructure management systems (LIMS)
Overviews
- ELN comparison grid by Harvard Biomedical Data.
- ELN guidance by the Gurdon institute of the University of Cambridge.
- Comparison of ELNs by the Labii ELN.
- List of ELNs on Wikipedia.
ELNs
- c6h6: Developed by the cheminfo organization, couchDB backend, modular frontend in JavaScript. ๐.
- chemotion ELN: Developed by Nicole Jung's group at KIT, with focus on organic chemistry. Written in JavaScript/Ruby. ๐
- openBIS: General purpose LIMS/ELN developed at ETH Zuerich, allows to add custom plugins and direct data analysis in Jupyter notebooks. Core written in Java. ๐.
- LabTrove: The ELN used for the Open Source Malaria project.
- bluesky: More than just an ELN. Bluesky is a Python ecosystem that can be used for experiment control and collection of scientific data and metadata, being developed at national labs with synchrotrons, there is a focuss on streaming data. . ๐.
- Materials Data Curation System (MDCS): A framework for capturing, sharing and transforming materials data in structured formats such as XML, based on user-selected templates. Developed at NIST.
- eLabFTW: an open source lab notebook platform with support for inventory management, scheduling and REST APIs (amongst other things). Written with PHP/MySQL.
Repositories
-
chemotion repository: Repository for molecules, reactions and research data.
-
Materials Commons: A site for Materials Scientists to collaborate, store and publish research. . ๐
-
nmrshiftdb2 Repository for NMR data. A major rework of the software is pending. Use of NMReDATA is central. Can include raw data and processed data (currently the case for some entries, mostly peak lists). Integration in workflows is possible (e.g. prediction used in chemotion).
Schemas/Ontologies
Overviews
Generic
-
SciData: Scientific data model (SDM) and related ontology (SDMO).
-
BFO: The Basic Formal Ontology (BFO) is an upper-level ontology "designed for use in supporting information retrieval, analysis and integration in scientific and other domains", under development since 2002 (last release 2019).
-
oreChem: Its goal was to create an ontology for scientific experiments, was funded by Microsoft research. "The oreChem s Ontology [eo] describes (a) the planned method of a scientific experiment; (b) the enactment of plans and (c) the provenance of objects realised during enactments." ๐. ๐ด
-
elnItemManifest: Describes core metadata for ELNs (like title, keywords, identifiers, contact, license information, related items, contributors, content, source). ๐. ๐ด
-
autoprotocol: Language for specifying experimental protocols. Has with Autoprotocol standard changes a mechanism similar to Python enhancement proposals for changes in the standard.
-
Chemical Markup Language (CML): XML for most chemistry, especially molecules, compounds, reactions, spectra, crystals and computational chemistry, developed by Peter Murray-Rust and Henry Rzepa. ๐ ๐ด (Last Update: 2013-04-22)
-
European Materials and Modelling Ontology (EMMO): An ontology designed to represent the "complex multiscale nature of chemicals and materials" with varying analytical philosophical interpretations. Available from the emmo-repo organisation on GitHub.
-
Materials Genome Initiative JSON: JSON schema for materials science and engineering. Directly related is Material Schema which aims to extend schema.org for materials science.
-
ChemAxiom: Ontological framework for chemistry, led by Peter Murray-Rust. ๐ด
-
Chemical Information Ontology: "aims to establish a standard in representing chemical information. In particular, it aims to produce an ontology to represent chemical structure and to richly describe chemical properties, whether intrinsic or computed." (direct quote from the
README
) ๐.
Analytical methods
- Chemical Methods Ontology: Describes methods used to collect data in chemical experiments, based on the IUPAC Orange Book in ontology language (OBO and OWL).
- Chemical Analysis Metadata Platform: Ontology, Vocabularies, Metadata for Chemical Analysis. ๐ด
- Analytical Information Markup Language (AnIML): American Society for Testing and Materials (ASTM) XML standard for analytical chemistry and biological data, with regular meetings. ๐
Organic reactions
- RXNO: reaction ontologies: Name reaction ontology.
- Molecular process ontology (MOP): Molecular processes that underlie the name reaction ontology RXNO.
- ord-schema: Schema for organic reactions developed for the open reaction database.
Biology
-
ISA framework: Data model built around "Investigation" (project context), "Study" (unit of research), "Assay" (analytical measurement) to manage life science/biomedical (*omics) experiments. ๐
-
SD2E/opil Synthetic biology experiment description effort intended to standardize the interface between human-generated experimental requests and lab-automated protocol.
Materials synthesis
Materials properties
-
MatML: XML format for the interchange of materials information. ๐ด
-
Physical information file (PIF): Schema for information about physical systems, maintained by Citrine informatics.
Spectral data
-
NMReDATA: Proposing a format for storing NMR data.
-
JCAMP-DX: Joint Committee on Atomic and Molecular Physical Data (JCAMP) data extension file for spectral data. Used by the c6h6 eln.
-
nmrML: Open mark-up language for NMR raw and spectral data. ๐.
-
CMLSpect: XML Vocabulary for Spectral Data (extension of CML). ๐ ๐ด (Last Update: 2013-04-22).
Initiatives/Consortia
-
NFDI4Chem: Initiative to build an open and FAIR infrastructure for research data management in chemistry.
-
Blue Obelisk: Internet group promoting reusable chemistry via open source software development. ๐ ๐
-
GO FAIR Chemistry Implementation Network: Goals are "to enhance the open, FAIR and effective communication of chemical knowledge within the chemical sciences and between chemistry and other disciplines" and "to enable chemists and chemistry to contribute to the achievement of the UN Global Sustainable Development goals" (direct quotes from the website). ๐.
-
Chemistry Research Data IG: Interest Group of the Research Data Alliance (RDA) that aims to foster exchange on chemical data.
-
RDA/CODATA Materials Data, Infrastructure & Interoperability IG: Interest Group of the Research Data Alliance (RDA) that aims to foster exchange on material data.
Related compilations
-
Awesome Materials Informatics: In contrast to this compilation, Awesome Materials Informatics focuses more on computational materials science.
-
FAIRsharing.org domain collections for Chemistry and Semantic Assets for Materials Science.