Home

Awesome

PIDiff: Physics Informed Diffusion Model for Protein Pocket Specific 3D Molecular Generation

<img src="https://github.com/hello-maker/PIDiff/blob/master/assets/main.jpg">

Requirements

We include key dependencies below. Our detailed environmental setup is available in [environment.yml] The code has been tested in the following environment:

PackageVersion
Python3.8
PyTorch1.13.1
CUDA11.6
PyTorch Geometric2.2.0
RDKit2022.03.2

Install via Conda

conda create -n PIDiff python=3.8
conda activate PIDiff
conda install pytorch pytorch-cuda=11.6 -c pytorch -c nvidia
conda install pyg -c pyg
conda install rdkit openbabel tensorboard pyyaml easydict python-lmdb -c conda-forge

Data

The data used for training/evaluation would have been provided through the submission site in a folder named Data or Google Drive folder.

Data
|__Training Data  
|   |  # Raw complex structures of protein-ligand available from the CrossDocked2020 dataset. Proteins are specified in .pdb format, and Ligands in .sdf format.
|   |__crossdocked_v1.1_rmsd1.0.tar.gz 
|   |
|   |  # Processed data that can be used for model training, obtainable through the execution of the ./Anonymous/datasets/pl_pair_dataset.py file
|   |__crossdocked_v1.1_rmsd1.0_pocket10_processed_final.lmdb 
|   |
|   |  # Index storage files for each sample, used for splitting the train set and test set, or for other preprocessing purposes.
|   |__index.pkl
|    
|__Split
|   |   # Names and index numbers of samples used directly for training and validation.
|   |___crossdocked_pocket10_pose_split.pt
|   |
|   |   # Raw file for creating the crossdocked_pocket10_pose_split.pt file. It is split through pdb id.
|   |___split_by_name.pt
|
|__Test Data
|   |...
|

To train the model from scratch, you need the preprocessed lmdb file and split file:

To evaluate the model on the test set, you need to unzip the test_set.zip in Data folder. It includes the original PDB files that will be used in Vina Docking.

If you want to process the dataset from scratch, you need to download CrossDocked2020 v1.1 from here, save it into data/CrossDocked2020, and run the scripts in scripts/data_preparation:

Training

Training from scratch

python scripts/train_diffusion.py configs/training.yml

Sampling

Sampling for pockets in the testset

python scripts/sample_diffusion.py configs/sampling.yml --data_id {i}

Evaluation

Evaluation from sampling results

python scripts/evaluate_diffusion.py {OUTPUT_DIR} --docking_mode vina_score --protein_root data/test_set

The docking mode can be chosen from {qvina, vina_score, vina_dock, none}

Note: It will take some time to prepare pqdqt and pqr files when you run the evaluation code with vina_score/vina_dock docking mode for the first time.

Real-world Validation

If you want to generate molecules for a new protein not in the test set, you should run ./scripts/real_world/Iinference.ipynb. Remember that you need to prepare the ligand's .sdf file for creating the protein pocket and the .pdb file containing the structural information of the protein.

Typically, the above process is also necessary for performing MD simulation.

Result

The main results for the proposed model are presented in the table below. For a more comprehensive overview of the results obtained with our model, please refer to the Report.

Evaluation of Generated Molecule

ModelVinaScoreVinaMinVinaDockHighAiffinityVinaScore<sub>SA</sub>SR
AR-5.75-6.18-6.750.379-5.5974.7%
LiGAN---6.330.21--68.4%
GraphBP---4.800.14-57.1%
Pocket2Mol-5.15-6.42-7.150.48-5.1288.7%
DiffSBDD52.7816.45-6.650.452-51.5383.0%
DrugGPS28.186.33-3.740.12-27.3248.1%
TargetDiff-5.47-6.64-7.800.57-5.3191.9%
ResGen13.79-1.53-4.900.23-13.7340.7%
PIDiff-6.58-7.52-8.100.64-6.03100%
Testset-6.36-6.71-7.45--6.28-
<table class="center"> <tr> <td style="text-align:center;"><b>Distribution of RMSD before and after Docking</b></td> <td style="text-align:center;"><b>Demo video of Molecular Dynamics about Generated Molecule</b></td> </tr> <tr> <td><img src="https://github.com/hello-maker/PIDiff/blob/master/assets/change.png" width="400"></td> <td><img src="https://github.com/hello-maker/PIDiff/blob/master/assets/MD_result.gif" width="400"></td> </tr> </table>