Home

Awesome

IBM Molecule Generation Experience (Community Version)

IBM Molecule Generation Experience (MolGX) is a tool to accelerate an AI-driven design of new materials. This is the Community Version which implements a small yet essential subset of its capabilities selected from the Enterprise Version. With the Community Version, we intend to share our important technologies with a wide range of communities as well as to further improve these technologies through a collaborative, open development.

Requirements

MolGX runs with the following versions of Python and pip:

  1. Python >=3.7, <3.9

  2. pip>=19.1, <20.3

This restriction intends to be consistent with GT4SD.

Installation

We recommend to create a conda environment such as:

conda create -n molgx_env python=3.7 anaconda

Then, for Windows tupe the following command:

activate molgx_env # for windows

For the other environments such as Linux/MacOS:

conda activate molgx_env # for the others

There are two ways to install MolGX:

Type the following command if you want to install MolGX from PyPI:

pip install molgx

Type the following commands if you want to clone the source code to install it:

git clone git@github.com:GT4SD/molgx-core.git
cd ./molgx-core
pip install .

Running MolGX

At present, there are two ways to run MolGX. One is to use it as a standalone application that allows to use its full capabilities. The other is to use a pretrained model under GT4SD, which plans to be extended to support more capabilities.

Running an example on jupyter notebook as a standalone application

Here is an example on giving an overview of the usage of MolGX. You will need to install the Jupyter Notebook to run the example. One way is to install the Jupyter Notebook is:

conda install jupyter notebook

Then, you will be able to invoke it with jupyter-notebook.

Communicating with GT4SD

A pre-trained model for 10 QM9 samples with target propetries homo and lumo is along with GT4SD core algorithms. Running the algorithm is as easy as typing:

from gt4sd.algorithms.conditional_generation.molgx.core import MolGX, MolGXQM9Generator

import logging
logging.disable(logging.INFO)

configuration = MolGXQM9Generator()
algorithm = MolGX(configuration=configuration)
items = list(algorithm.sample(3))
print(items)

See this example.

Building a documentation

You will need Sphinx. You can install it with Anaconda as follows:

conda install sphinx

Type the following command to generate a document:

cd ./docs
make html

You will then find the html files under docs/_build/html and open index.html with your web browsewr.

For developers

Type the following command after activating your conda environment:

pip install -e .

Miscellaneous

The web application of MolGX is available here.

Additionally, the following papers describe some of the essential algorithms implemented in the Community Version as well as the other techniques not implemented here:

  1. Seiji Takeda, Toshiyuki Hama, Hsiang-Han Hsu, Akihiro Kishimoto, Makoto Kogoh, Takumi Hongo, Kumiko Fujieda, Hideaki Nakashika, Dmitry Zubarev, Daniel P. Sanders, Jed W. Pitera, Junta Fuchiwaki, Daiju Nakano. Molecule Generation Experience: An Open Platform of Material Design for Public Users. CoRR abs/2108.03044, 2021.

  2. Seiji Takeda, Toshiyuki Hama, Hsiang-Han Hsu, Victoria A. Piunova, Dmitry Zubarev, Daniel P. Sanders, Jed W. Pitera, Makoto Kogoh, Takumi Hongo, Yenwei Cheng, Wolf Bocanett, Hideaki Nakashika, Akihiro Fujita, Yuta Tsuchiya, Katsuhiko Hino, Kentaro Yano, Shuichi Hirose, Hiroki Toda, Yasumitsu Orii, Daiju Nakano. Molecular Inverse-Design Platform for Material Industries. pages 2961-2969, KDD 2020.

Finally, we use some of the data extracted from the QM9 database with the following references:

  1. L. Ruddigkeit, R. van Deursen, L. C. Blum, J.-L. Reymond, Enumeration of 166 billion organic small molecules in the chemical universe database GDB-17, J. Chem. Inf. Model. 52, 2864–2875, 2012.
  2. R. Ramakrishnan, P. O. Dral, M. Rupp, O. A. von Lilienfeld, Quantum chemistry structures and properties of 134 kilo molecules, Scientific Data 1, 140022, 2014.