Home

Awesome

sbdd_practical_evaluation

This GitHub repository contains the dataset and relevant code for the paper:

From Theory to Therapy: Reframing SBDD Model Evaluation via Practical Metrics

Dataset

The dataset is hosted at HuggingFace Dataset Dir

It should contain following files:

PDBBind.lmdb.zip

processed pdbbind data for training in lmdb format. Docs for lmdb can be found at: https://lmdb.readthedocs.io/en/release/

PDBBind-DUD_E_FLAPP_0.6.pkl

train/valid split file for 0.6 version

PDBBind-DUD_E_FLAPP_0.9.pkl

train/valid split file for 0.9 version

DUDE.zip

DUD-E test set. Each directory is a target and contains all needed files for evaluation.

LIT-PCBA.zip

LIT-PCBA test set. Each directory is a target and contains all needed files for evaluation.

DUDE_generated_mols.zip

generated molecules by different methods for targets in DUD-E. Molecules are in .sdf format.

PCBA_generated_mols.zip

generated molecules by different methods for targets in LIT-PCBA. Molecules are in .sdf format.

pretrain_weights.zip

drugclip.pt: weights for pretrained DrugCLIP model

mol_pre_no_h_220816.pt: weights for pretrained Uni-Mol molecular Encoder

pocket_pre_220816.pt: weights for pretrained Uni-Mol pocketr Encoder

Environment Setup

use sbdd.yaml and encoder.yaml

Model Training and Sampling

The code for the five models we tested is located in the models folder. We have made minor modifications to the dataset reading code to accommodate our new data. However, the training and sampling execution methods are consistent with those in the official repositories.

For detailed execution instructions, please refer to the official documentation of the respective repositories:

LiGAN: https://github.com/mattragoza/LiGAN

AR: https://github.com/luost26/3D-Generative-SBDD

Pocket2Mol: https://github.com/pengxingang/Pocket2Mol

TargetDiff: https://github.com/guanjq/targetdiff

MolCRAFT: https://github.com/AlgoMole/MolCRAFT

Modified Code for each model can be found at models dir

Model Evaluation

All the generated Mols are in DUDE_generated_mols.zip and PCBA_generated_mols.zip.

Using fingerprints to do evaluation

cd Fingerprint_Eval

to do similaritiy based eval on DUD-E dataset

python sim_dude.py

to do similaritiy based eval on LIT-PCBA dataset

python sim_pcba.py

to do virtual screening eval on DUD-E dataset

python vs_dude.py

Using Deep Encoders to do evaluation


cd Encoder_Eval

bash test_sbdd.sh

Note that the pretrained weights in Hugging Face dataset dir should be downloaded.

change the parametes in test_sbdd.sh

change encoder to drugclip or unimol

change metric to vs(virtual screening), sim(similarities), or score(DrugCLIP score)

change test path to point to DUD-E or LIT-PCBA path downloaded from Hugging Face Dir

change model path to point to the model outputs downloaded from Hugging Face Dir