Home

Awesome

MolSnapper: Conditioning Diffusion for Structure Based Drug Design

This is A tool to condition diffusion model for Generating 3D Drug-Like Molecules.

This repository is build on MolDiff code and conditioned MolDiff trained model.

More information can be found in our paper.

Installation

Dependency

The codes have been tested in the following environment:

PackageVersion
Python3.9.18
PyTorch2.0.1
CUDA11.7
PyTorch Geometric2.3.1
RDKit2022.03.5
Biopython1.83
PyTorch Scatter2.1.1

Install via conda yaml file (cuda 11.3)

conda env create -f env.yml
conda activate MolSnapper

Install manually

conda create -n MolSanpper python=3.9 # optinal, create a new environment
conda activate MolSanpper

conda install pytorch pytorch-cuda=11.7 -c pytorch -c nvidia
conda install pyg -c pyg
conda install -c pyg pytorch-scatter

# Install other tools
conda install -c conda-forge rdkit
conda install pyyaml easydict python-lmdb -c conda-forge
conda install -c oddt oddt

Dataset

CrossDocked

Download and the processed testset from DecompDiff repository https://github.com/bytedance/DecompDiff
Please download the following files:

Save them in <test_directory> and process the test data using:

python scripts/prepare_data_cd.py --pairs-paths <test_directory>/test_index.pkl --root-dir <test_directory>  --out-mol-sdf <data_dir>/test_mol.sdf --out-pockets-pkl <data_dir>/test_pockets.pkl --out-table <data_dir>/test_table.csv

For example:

python scripts/prepare_data_cd.py --pairs-paths ./../crossdocked/test_index.pkl --root-dir ./../crossdocked/test_set  --out-mol-sdf ./../crossdocked/test_mol.sdf --out-pockets-pkl ./../crossdocked/test_pockets.pkl --out-table ./../crossdocked/test_table.csv

Processed data

The processed CrossDocked test set can be found in data dir:

data
├── crossdocked
│   ├── test_mol.sdf
│   ├── test_pockets.pkl
│   └── test_table.csv

Binding MOAD

Download and split the dataset as described by the authors of DiffSBDD https://github.com/arneschneuing/DiffSBDD/tree/main
Save the test set in <test_directory>

After removing water process the test directory using:

python scripts/prepare_moad.py --test_path <test_directory>  --out-mol-sdf <data_dir>/test_mol.sdf --out-pockets-pkl <data_dir>/test_pockets.pkl --out-table <data_dir>/test_table.csv

Processed data

The processed Binding MOAD data can be found here:

data
├── MOAD
│   ├── test_mol.sdf
│   ├── test_pockets.pkl
│   └── test_table.csv

Raw complex

If you have raw complexes, remove hydrogen and separate the pockets from the ligands using:

python scripts/clean_and_split.py --in-dir <data_directory>  --proteins-dir <pockets_directory> --ligands-dir <ligands_directory>

For a given pocket process the pocket

python scripts/prepare_single_complex.py --root_dir  <data_directory>  --ligand_filename <ligand_filename>.sdf  --protein_filename <protein_filename>.pdb --out_pockets_path <output_path>.pkl

For example:

python scripts/prepare_single_complex.py --root_dir  <data_directory>  --ligand_filename ligand.sdf --protein_filename data/protein.pdb --out_pockets_path ./data/protein.pkl

Processed complex

An example of a processed complex (PDB ID: 1h00) can be found here:

data
├── example_1h00
│   ├── ref_points.sdf
│   ├── processed_pocket_1h00.pkl
│   └── ligand.sdf

Sample

MolDiff provided the pretrained models, please first download the pretrained model weights from here and put them in the ./ckpt folder. MolSnapper uses the following model weight files:

Sample molecules for a given pocket

After setting the correct model weight paths in the config file, you can run the following command to sample molecules:

python scripts/sample_single_pocket.py --outdir .<output_directory> --config <path_to_config_file> --device <device_id> --batch_size <batch_size> --pocket_path <pocket_path>.pkl --sdf_path <sdf_path>.sdf --use_pharma <use_pharma> --num_pharma_atoms <num_pharma_atoms> --clash_rate <clash_rate>

The parameters are:

An example command is:

python scripts/sample_single_pocket.py --outdir ./outputs --config ./configs/sample/sample_MolDiff.yml --batch_size 32 --pocket_path ./data/example_1h00/processed_pocket_1h00.pkl --sdf_path ./data/example_1h00/ref_points.sdf --use_pharma False --clash_rate 0.1

After sampling, there will be two directories in the outdir folder that contains the meta data and the sdf files of the sampling, respectively.

Sample molecules for all pockets in the test set

For sample molecules for all the test set use:

python scripts/sample.py --outdir .<output_directory> --config <path_to_config_file> --device <device_id> --batch_size <batch_size> --pocket_dir <data_directory> --num_pharma_atoms <num_pharma_atoms> --clash_rate <clash_rate>

An example command is:

python scripts/sample.py --outdir ./outputs --config ./configs/sample/sample_MolDiff.yml --batch_size 32 --pocket_dir ./data/crossdocked  --num_pharma_atoms 20 --clash_rate 0.1

Evaluate

Filter the generted molecules using PoseBusters.

To evaluate basic molecular properties, 3D similarity to reference ligand, and hydrogen bonds by ODDT of the generated molecules, run the following command:

python scripts/evaluate.py  <gen_root> --protein_path <protein_path>.pdb --reflig_path <reflig_path> --save_path <save_path>

The parameters are:

For example:

python scripts/evaluate.py  ./outputs/my_run --protein_path ./data/example_1h00/pocket/1h00_protein.pdb --reflig_path ./data/example_1h00/ligand.sdf --save_path ./outputs/my_run/eval