Home

Awesome

MASSA

Implementation of paper:

Hu, F., Hu, Y., Zhang, W., Huang, H., Pan, Y., & Yin, P. (2023). A Multimodal Protein Representation Framework for Quantifying Transferability Across Biochemical Downstream Tasks. Advanced Science, 2301223. https://doi.org/10.1002/advs.202301223

python >3.7.12

Install

scipy-1.7.3 numpy-1.21.5 pandas-1.3.0 scikit__learn-0.24.1 torch-1.10.1 torch_geometric-2.0.3

Data

The data can be downloaded from these links. If you have any question, please contact hz.huang@siat.ac.cn.

Pretrain dataset: https://drive.google.com/file/d/1xHUs0B9VuKviBzj-k-203p4a9vEoo1RW/view?usp=sharing Downstream dataset: https://drive.google.com/file/d/10yywJNTQ9Z30B_4uyNfQhnXQdhhdjK3W/view?usp=sharing GNN-PPI data: https://drive.google.com/file/d/1YSXNsTJo-Cdxo08cHLb6ghd6noJJ4y73/view?usp=sharing GNN-PPI pretrained embedding: https://drive.google.com/file/d/1sq2VQGAMWmWg02hqhyWju2xuiJ-oHbq0/view?usp=sharing

Checkpoint

The pre-trained model checkpoint can be downloaded from this link. If you have any question, please contact hz.huang@siat.ac.cn.

https://drive.google.com/file/d/1NVxB00THWxKdTZkLM7T6xdQJM_3TFMVr/view?usp=sharing

Usage

You can download this repo and run the demo task on your computing machine.

cd Multimodal_pretrain/
python src_v0/main.py
# For example
cd Multimodal_downstream/
python src_stability/main.py
cd Multimodal_downstream/GNN-PPI/
python src_v0/run.py

You can select the hyperparameters of the Performer encoder based on your data and task in:

HyperparameterDescriptionDefaultArbitrary range
seq_dimSize of sequence embedding vector768
seq_hid_dimSize of hidden embedding on sequence encoder512[128, 256, 512]
seq_encoder_layer_numNumber of sequence encoder layers3[3, 4, 5]
struc_hid_dimSize of hidden embedding on structure encoder512[128, 256, 512]
struc_encoder_layer_numNumber of sequence encoder layers2[2, 4, 6]
go_input_dimSize of goterm embedding vector64
go_dimSize of hidden embedding on goterm encoder128[128, 256, 512]
go_n_headsNumber of attention heads of goterm encoder4[4, 8, 16]
go_n_layersNumber of goterm encoder layers3[3, 4, 5]

Citations

If you use our framework in your research, please cite our paper:

@article{hu2023multimodal,
  author={Hu, Fan and Hu, Yishen and Zhang, Weihong and Huang, Huazhen and Pan, Yi and Yin, Peng},
  title={A Multimodal Protein Representation Framework for Quantifying Transferability Across Biochemical Downstream Tasks},
  journal={Advanced Science},
  year={2023},
  pages={2301223},
  doi={10.1002/advs.202301223}
}