Awesome
MASSA
Implementation of paper:
Hu, F., Hu, Y., Zhang, W., Huang, H., Pan, Y., & Yin, P. (2023). A Multimodal Protein Representation Framework for Quantifying Transferability Across Biochemical Downstream Tasks. Advanced Science, 2301223. https://doi.org/10.1002/advs.202301223
Install
Data
The data can be downloaded from these links. If you have any question, please contact hz.huang@siat.ac.cn.
Pretrain dataset: https://drive.google.com/file/d/1xHUs0B9VuKviBzj-k-203p4a9vEoo1RW/view?usp=sharing Downstream dataset: https://drive.google.com/file/d/10yywJNTQ9Z30B_4uyNfQhnXQdhhdjK3W/view?usp=sharing GNN-PPI data: https://drive.google.com/file/d/1YSXNsTJo-Cdxo08cHLb6ghd6noJJ4y73/view?usp=sharing GNN-PPI pretrained embedding: https://drive.google.com/file/d/1sq2VQGAMWmWg02hqhyWju2xuiJ-oHbq0/view?usp=sharing
Checkpoint
The pre-trained model checkpoint can be downloaded from this link. If you have any question, please contact hz.huang@siat.ac.cn.
https://drive.google.com/file/d/1NVxB00THWxKdTZkLM7T6xdQJM_3TFMVr/view?usp=sharing
Usage
You can download this repo and run the demo task on your computing machine.
- Pre-train model.
cd Multimodal_pretrain/
python src_v0/main.py
- Fine-tune on downstream tasks using pre-trained models (downstream tasks: stability, fluorescence, remote homology, secondary structure, pdbbind, kinase).
# For example
cd Multimodal_downstream/
python src_stability/main.py
- Fine-tune on gnn-ppi using pre-trained embedding.
cd Multimodal_downstream/GNN-PPI/
python src_v0/run.py
- Guidance for hyperparameter selection.
You can select the hyperparameters of the Performer encoder based on your data and task in:
Hyperparameter | Description | Default | Arbitrary range |
---|---|---|---|
seq_dim | Size of sequence embedding vector | 768 | |
seq_hid_dim | Size of hidden embedding on sequence encoder | 512 | [128, 256, 512] |
seq_encoder_layer_num | Number of sequence encoder layers | 3 | [3, 4, 5] |
struc_hid_dim | Size of hidden embedding on structure encoder | 512 | [128, 256, 512] |
struc_encoder_layer_num | Number of sequence encoder layers | 2 | [2, 4, 6] |
go_input_dim | Size of goterm embedding vector | 64 | |
go_dim | Size of hidden embedding on goterm encoder | 128 | [128, 256, 512] |
go_n_heads | Number of attention heads of goterm encoder | 4 | [4, 8, 16] |
go_n_layers | Number of goterm encoder layers | 3 | [3, 4, 5] |
Citations
If you use our framework in your research, please cite our paper:
@article{hu2023multimodal,
author={Hu, Fan and Hu, Yishen and Zhang, Weihong and Huang, Huazhen and Pan, Yi and Yin, Peng},
title={A Multimodal Protein Representation Framework for Quantifying Transferability Across Biochemical Downstream Tasks},
journal={Advanced Science},
year={2023},
pages={2301223},
doi={10.1002/advs.202301223}
}