Home

Awesome

DiffDec: Structure-Aware Scaffold Decoration with an End-to-End Diffusion Model

Summary

DiffDec is an end-to-end E(3)-equivariant diffusion model to optimize molecules through molecular scaffold decoration conditioned on the 3D protein pocket.

<p align='center'> <img src="./assets/overview.jpg" alt="architecture"/> </p>

Install conda environment via conda yaml file

conda env create -f environment.yaml

Datasets

Please refer to README.md in the data folder.

Training

To train a model for single R-group decoration task, run:

python train_single.py --config configs/single.yml

To train a model for multi R-groups decoration task, run:

python train_multi.py --config configs/multi.yml

Sampling

You can sample 100 decorated compounds for each input scaffold and protein pocket and change the corresponding parameters in the script. You can also download the model checkpoint file from this link and save it into ckpt/. Run the following:

bash sample.sh

You will get .xyz and .sdf files of the decorated compounds in the directory sample_mols.

Evaluation

You can run evaluation scripts after sampling decorated molecules:

bash evaluate.sh

Sampling for a specific protein pocket and a specific scaffold

To generate R-groups for your own pocket and scaffold, you need to provide the pdb structure file of the protein pocket, the sdf file of the scaffold, and the scaffold's smiles with anchor(s). For Example:

CUDA_VISIBLE_DEVICES=0 python sample_single_for_specific_context.py --scaffold_smiles_file ./data/examples/scaf.smi --protein_file ./data/examples/protein.pdb --scaffold_file ./data/examples/scaf.sdf --task_name exp --data_dir ./data/examples --checkpoint ./ckpt/diffdec_single.ckpt --samples_dir samples_exp --n_samples 1 --device cuda:0