

MiDi: Mixed Graph and 3D Denoising Diffusion for Molecule Generation

Update (July 2024): My drive account has unfortunately been deleted, and I have lost access some checkpoints. If you happen to have a downloaded checkpoint stored locally, I would be glad if you could send me an email at vignac.clement@gmail.com or raise a Github issue.

Link to the paper

Clément Vignac*, Nagham Osman*, Laura Toni, Pascal Frossard

ECML 2023


This code was tested with PyTorch 2.0.1, cuda 11.8 and torch_geometric 2.3.1 on multiple gpus.


Thanks to Ian Dunn for the new links to the data!


First move inside the src folder (so that the outputs are saved at the right location):

Some examples:

QM9 without hydrogens on cpu

python3 main.py dataset=qm9 dataset.remove_h=True +experiment=qm9_no_h_uniform

GEOM-DRUGS with hydrogens on 2 gpus

python3 main.py dataset=geom dataset.remove_h=False +experiment=geom_with_h_uniform general.gpus=2

Resuming a previous run

First, retrieve the absolute path of the checkpoint, it looks like ABS_PATH=/home/vignac/MiDi/outputs/2023-02-13/18-10-49-geomH/checkpoints/geomH_bigger/epoch=219.ckpt'

Then run:

python3 main.py dataset=qm9 dataset.remove_h=True +experiment=qm9_no_h_uniform general.resume='ABS_PATH'


Sampling on multiple gpu is not really handled, we recommand sampling on a single gpu.


python3 main.py dataset=qm9 dataset.remove_h=True +experiment=qm9_no_h_uniform general.test_only='ABS_PATH'


QM9 implicit H:

QM9 explicit H:

Geom explicit H:

Generated samples

QM9 implicit H:

QM9 explicit H:

Geom with explicit H:

Evaluate your model on the proposed metrics

To benchmark your own model with the proposed metrics, you can use the sampling_metrics function in src/metrics/molecular_metrics.py: sampling_metrics(molecules=molecule_list, name='my_method', current_epoch=-1, local_rank=0).

You'll need to write a few lines to load your generated graphs and create a list of Molecule objects (in src/analysis/rdkit_functions.py).

Use MiDi on a new dataset

To implement a new dataset, you will need to create a new file in the src/datasets folder. This file should implement a Dataset class, a Datamodule class and and Infos class. Check qm9_dataset.py and geom_dataset.py for examples.

Once the dataset file is written, the code in main.py can be adapted to handle the new dataset, and a new file can be added in configs/dataset.

Use OpenBabel for baseline results

Cite this paper

  title={MiDi: Mixed Graph and 3D Denoising Diffusion for Molecule Generation},
  author={Vignac, Clement and Osman, Nagham and Toni, Laura and Frossard, Pascal},
  journal={arXiv preprint arXiv:2302.09048},