Home

Awesome

<div align='center'><font face="Times New Roman" size='70'>3D-SMGE</font></div>

Paper "3D-SMGE:A Pipeline for Molecules Generate and Evaluate based on Scaffolds"

Summarize

3D-SMGE is a scaffold-based neural network pipeline for 3D molecular generation and evaluation. 3D-SMGE presented in this work consists of two main modules, the molecular generation module named 3D-SMG and the ADMET prediction module. 3D-SMG uses atomic coordinates and atomic types as molecular representation. The generation of 3D molecules in 3D euclidean space is based on two generative approaches. If only provide scaffold structure and no position is specified, the chain generation will be performed for all possible in the scaffold. This way named approach1. If specific positions is also provided, the side chain generation will performed at the provided positions. This way named approach2. In the ADMET properties prediction module,we propose the data adapted multi-models that 24/27 surpassed or maintained the highest accuracy on the benchmark dataset metrics. We train the 3D-SMG on the ZINC-5w data set which filtered from ZINC-Standard data set with the heavy atoms from fluorine, oxygen, nitrogen, and carbon, slfur, chlorine. During the generation,you can provide SMILES, PDB, mol2 files for molecular generation.

overview

neural network figure

Requirements

Getting start

Data Preparation

git clone git@github.com:ZheLi-Lab-Collaboration/3D-SMGE.git
python prepareDatatset.py --xyz_path ./xyz_files

Training the Deep Generative Model 3D-SMG

python SMG_3D.py train 3D_SMG ./data/ ./model --split 37905 2527 --cuda --batch_size 5 --draw_random_samples 5 --features 128 --interactions 7 --caFilter_per_block 4 --max_epochs 1000
torchrun --standalone --nnodes=1 --nproc_per_node=4 SMG_3D_parallel.py train 3D_SMG ./data/ ./model --split 37905 2527 --cuda --parallel --batch_size 5 --draw_random_samples 5 --features 128 --interactions 7 --caFilter_per_block 4 --max_epochs 1000

Eval and Test the Deep Generative Model 3D-SMG

python SMG_3D.py eval 3D_SMG  ./data/ ./model --split validation --cuda --batch_size 5 --features 128 --interactions 7 --caFilter_per_block 4
python SMG_3D.py test 3D_SMG  ./data/ ./model --split test --cuda --batch_size 2 --features 128 --interactions 7 --caFilter_per_block 4
torchrun --standalone --nnodes=1 --nproc_per_node=1 SMGE_3D_eval_single_gpu.py eval 3D_SMG  ./data/ ./model --split validation --cuda --parallel --batch_size 3 --features 128 --interactions 7 --caFilter_per_block 4
torchrun --standalone --nnodes=1 --nproc_per_node=1 SMG_3D_eval_single_gpu.py test 3D_SMG  ./data/ ./model --split test --cuda --parallel --batch_size 5 --features 128 --interactions 7 --caFilter_per_block 4

Generating Molecules with the Deep Generative Model 3D-SMG

During the generating molecules, we provide three scaffold input formats and two generation modes.

Filter the Generated Molecules

python filter_generated.py ./model/generated/scaffold.mol_dict 

Displaying Generated Molecules

python display_generateMolcules.py  ./model/generated/generated_molecules.db

Convert the Generated Molecules into .xyz File

python write_xyz_files.py ./model/generated/

Convert the .xyz File into .smi File

python xyz_to_smiles.py ./model/generated

For the final generated molecules, we not only provide 2D SMILES format, but also provide 3D XYZ format.

ADMET Predicition

Firstly, you are supposed to move the generated molecules agg_smi.smi to the ./data folder.

python ./property_Pred/ADMET/general_admet/admet-pred.py --smi_path ../data/agg_smi.smi --csv_path ../data/smi_csv.csv --admet_result_path ../data/final_admet.csv

admet_pred

Fundamental Properties Prediction

python ./property_Pred/base/base_feature.py --csv_path ../data/smi_csv.csv --baseP_result_path ../data ../data/baseP_result.csv

Base feature

We provide 8 fundamental predictions such as logP, SAScore, QED, TPSA, NumHAcceptors, NumHDonors, NumRotatableBonds, NumAliphaticRings

Dataset and Weights File

Deployment Weights for ADMET Prediction

You are supposed to unzip the weights file and put it in ./property_Pred/ADMET/best-model

<u>ADMET Prediction</u>

A Small Dataset for Testing the 3D-SMGE

You are supposed to unzip the dataset file and put it in ./data/ for training.

<u>DatsetDB</u>

The Deep Generative Model 3D-SMG Deployment Weights for Testing

You are supposed to unzip the weights file and put it in the root directory for evaluating, testing model and generating molecues.

<u>3D-SMG Model Weights</u>

Citation

If you find this useful, please consider citing our paper:

@article{10.1093/bib/bbad327,
    author = {Xu, Chao and Liu, Runduo and Huang, Shuheng and Li, Wenchao and Li, Zhe and Luo, Hai-Bin},
    title = "{3D-SMGE: a pipeline for scaffold-based molecular generation and evaluation}",
    journal = {Briefings in Bioinformatics},
    pages = {bbad327},
    year = {2023},
}