Home

Awesome

Chemprop benchmarking scripts and data

This repository contains benchmarking scripts and data for Chemprop, a message passing neural network for molecular property prediction, as described in the paper Chemprop: Machine Learning Package for Chemical Property Prediction. Please have a look at the Chemprop repository for installation and usage instructions.

Data

All datasets used in the study can be downloaded from Zenodo. You can either download and extract the file data.tar.gz yourself, or run

wget https://zenodo.org/records/10078142/files/data.tar.gz
tar -xzvf data.tar.gz

The data folder should be placed within the chemprop_benchmark folder (i.e. where this README and the scripts folder are located).

Benchmarks

The paper reports a large number of benchmarks, than can be run individually by executing one of the shell scripts in the scripts folder. For example, to run the barriers_e2 reaction benchmark, activate your Chemprop environment as described in the Chemprop repository, and then run (after adapting the path to your Chemprop folder):

cd scripts
./barriers_e2.sh

This will run a hyperparameter search, as well as a training run on the best hyperparameters, and produce the folder results_barriers_e2 with all information. Specifically, the file results_barriers_e2/test_scores.csv will list the test set errors. If you have installed Chemprop via pip, use chemprop_train etc instead of python $chemprop_dir/train.py in the script.

Available benchmarking systems:

The benchmarks were done on the master branch of Chemprop v1.6.1. The only exception is the timing benchmarks, which were run on the benchmark_timing branch that includes timing printouts. However, they can also be run on the master branch, although with less verbous printouts. If you want to recreate the exact environment this study was run in, you can use the environment.yml file to set up a conda environment.