Home

Awesome

Balancing exploration and exploitation in de-novo drug design

concept_diversity (1)

Installation

The following instructions will allow you to :

git clone https://github.com/maxime-langevin/diverse_molecule_generation.git
git submodule update --init --recursive --remote
conda create -c conda-forge -n diverse_molgen rdkit
conda activate diverse_molgen
pip install -r requirements.txt

Reproducing results from the paper

Molecular generation

The following python scripts will run molecular generation in different settings :

python run.py --nruns 10 --dataset drd2
python run.py --nruns 10 --dataset egfr
python run.py --nruns 10 --dataset drd2 --use_memory_rl True

The EGFR and DRD2 datasets were extracted from the ExCAPE-DB database (Sun, J.; Jeliazkova, N.; Chupakhin, V.; Golib-Dzib, J.-F.; Engkvist, O.; Carlsson, L.; Wegner, J.; Ceulemans, H.; Georgiev, I.; Jeliazkov, V., et al. ExCAPE-DB: an integrated large scale dataset facilitating Big Data analysis in chemogenomics. J. Cheminf. 2017, 9, 1–9. 3.)

NB : this step can be skipped, using already generated trajectories in the results and results_memory_RL folders.

Reproducing results

Reproducing the graphs from the paper can be achieved by running all the notebooks at the root of the repository :

drd2_trajectories_beta_0.ipynb

to reproduce Figure 4.

drd2_trajectories_beta_100.ipynb

to reproduce Figure 8.

drd2.ipynb

to reproduce Figure 5, 7a, 9a, 11a, 11b, 11c and 14.

egfr.ipynb

to reproduce Figure 6, 7b, 9b, 11d.

compute_correlation_coefficient.ipynb

to reproduce Figure 12.

drd2_with_memory_RL.ipynb

to reproduce Figure 13.

NB : Some cells are commented in egfr.ipynb and drd2.ipynb and the results of those cells already saved in the robustness_experiments folder, as they take a very long time to run. Please uncomment them if you want to run them and regenerate the results stored in robustness_experiments from scratch.