Home

Awesome

We Should at Least Be Able to Design Molecules That Dock Well

Docking Benchmark Flow

To learn how to evaluate your model, see Getting Started notebook.

Paper: https://arxiv.org/abs/2006.16955.

News:

Results

Listed below are benchmark results from the paper for docking score optimization (the lower, the better). Each cell reports the mean score for the generated compounds and their internal diversity in parenthesis. For each protein we sampled a set of molecules from ZINC subset of protein's training set size. As a baseline, we also report results for the top 10% molecules from the training set and ZINC. Please see our paper for more details.

5HT1B5HT2BACM2CYP2D6
CVAE-4.647 (0.907)-4.188 (0.913)-4.836 (0.905)-
GVAE-4.955 (0.901)-4.641 (0.887)-5.422 (0.898)-7.672 (0.714)
REINVENT-9.774 (0.506)-8.657 (0.455)-9.775 (0.467)-8.759 (0.626)
Train (10%)-10.837 (0.749)-9.769 (0.831)-8.976 (0.812)-9.256 (0.869)
ZINC (10%)-9.894 (0.862)-9.228 (0.851)-8.282 (0.860)-8.787 (0.853)

Environment

The best way is to use conda environment. Create new environment and run docking_benchmark/install_conda_env.sh script.

Data

In order to run experiments or train models additional data is required. Download this zip, unpack it and set the DOCKING_BENCHMARK_DATA environment variable to this directory.

Experiments

Single component optimization

Run the docking_baselines/scripts/generate_molecules.py script. Run it with -h flag for info about arguments.

Details about some of the arguments: