Home

Awesome

HIGH-PPI

Hierarchical Graph Learning for Protein-Protein Interaction

Dependencies

HIGH-PPI runs on Python 3.7-3.9. To install all dependencies, directly run:

cd HIGH-PPI-main
conda env create -f environment.yml
conda activate HIGH-PPI

Download the following whl files to ./file/: torch-scatter, torch-sparse, torch-cluster, torch-spline-conv.

cd ./file
pip install torch_scatter-2.0.9-cp39-cp39-linux_x86_64.whl
pip install torch_sparse-0.6.13-cp39-cp39-linux_x86_64.whl
pip install torch_cluster-1.6.0-cp39-cp39-linux_x86_64.whl
pip install torch_spline_conv-1.2.1-cp39-cp39-linux_x86_64.whl
pip install torch-geometric

Datasets

Three datasets (SHS27k, SHS148k and STRING) can be downloaded from the Google Drive:

PPI Prediction

Example: predicting unknown PPIs in SHS27k datasets with native structures:

Using Processed Data for SHS27k Dataset

Download protein.actions.SHS27k.STRING.pro2.txt, protein.SHS27k.sequences.dictionary.pro3.tsv, edge_list_12, x_list and vec5_CTC.txt to ./HIGH-PPI-main/protein_info/.

Data Processing for New Datasets (if applicable)

Prepare all related PDB files. Native protein structures can be downloaded in batches from the RCSB PDB, and predicted protein structures with errors can be downloaded from the AlphaFold database. Put all of the PDB files in ./protein_info/.

Generate adjacency matrix with native PDB files:

python ./protein_info/generate_adj.py --distance 12

Generate feature matrix:

python ./protein_info/generate_feat.py

Training

To predict PPIs, use 'model_train.py' script to train HIGH-PPI with the following options:

python model_train.py --ppi_path ./protein_info/protein.actions.SHS27k.STRING.pro2.txt --pseq ./protein_info/protein.SHS27k.sequences.dictionary.pro3.tsv --split random --p_feat_matrix ./protein_info/x_list.pt --p_adj_matrix ./protein_info/edge_list_12.npy --save_path ./result_save --epoch_num 500

Testing

Run 'model_test.py' script to test HIGH-PPI with the following options:

python model_test.py --ppi_path ./protein.actions.SHS27k.STRING.pro2.txt --pseq ./protein.SHS27k.sequences.dictionary.pro3.tsv --p_feat_matrix ./x_list.pt --p_adj_matrix ./edge_list_12.npy --model_path ./result_save/gnn_training_seed_1/gnn_model_valid_best.ckpt --index_path ./train_val_split_data/train_val_split_1.json

Output

The output after running 'model_test.py' includes: