Awesome
Proteus
PyTorch Implementation for Proteus: Exploring Protein Structure Generation for Enhanced Designability and Efficiency.
<a href="https://openreview.net/pdf?id=IckJCzsGVS"><img src="https://img.shields.io/badge/Paper-ICML%202024-green" style="max-width: 100%;"></a> <a href="[https://openreview.net/pdf?id=IckJCzsGVS](https://openreview.net/pdf?id=IckJCzsGVS](https://www.biorxiv.org/content/10.1101/2024.02.10.579791v2.full.pdf)"><img src="https://img.shields.io/badge/Preprint-Biorxiv%202024-blue" style="max-width: 100%;"></a>
Overview
Proteus is a novel deep diffusion network designed to generate protein backbones with enhanced designability and efficiency. Unlike RFDiffusion which relies on large pre-trained network RosettaFold for structure prediction, Proteus utilizes graph-based triangle methods and a multi-track interaction network, achieving state-of-the-art performance without the need for pre-training. Notably, the inference speed has been accelerated from 4x up to 10x compared to FrameDiff and RFdiffusion. Our model's capabilities have been validated through comprehensive in silico evaluations and experimental characterizations, demonstrating its potential to significantly advance the field of protein design.
<img width="1023" alt="image" src="https://github.com/Wangchentong/Proteus/assets/59241275/9cd5d387-66c9-4f71-9fa8-6a27cd77a25b">Table of Contents
Install
We recommend miniconda (or anaconda). Run the following to install a conda environment with the necessary dependencies. Using mamba if possible for better install speed.
# install
conda env create -f se3.yml
# optional : using mamba for faster environment installation
conda install mamba
mamba env create -f se3.yml
# activate environment
conda activate Proteus
# install this repo as a local package
pip install -e .
Inference
The checkpoint is avaiable at ./weights/paper_weights.pt
monomer inference(command used in paper)
For the first time run, it might be a little slow because of downloading esmfold ckpt
weight_path=./weights/paper_weights.pt
python ./experiments/inference_se3_diffusion.py \
inference.output_dir=inference_outputs/monomer/ \
inference.weights_path=$weight_path \
inference.diffusion.samples.samples_lengths=[100,200,300,400,600,800] \
inference.diffusion.samples.samples_per_length=100 \
inference.diffusion.num_t=100
# config below is optional
# To disable esmfold prediction and mpnn design, add extra config
inference.mpnn.enable=False inference.esmfold.enable=False
# To disable esmfold prediction add extra config
inference.esmfold.enable=False
A self_consistency.csv will be generated in the inference_outputs/monomer/${timestap}/self_consistency.csv
, report all necessary metrics like dssp or sc-rmsd, etc.
oligomer inference
baseline_weight_path=./weights/paper_weights.pt
python ./experiments/inference_se3_diffusion.py \
inference.output_dir=inference_outputs/oligomer/ \
inference.weights_path=$baseline_weight_path \
inference.diffusion.samples.contigs='60-80//60-80' \
inference.diffusion.samples.samples_per_length=100 \
inference.diffusion.num_t=100
Inference output wuold be like
inference_outputs
└── 12D_02M_2023Y_20h_46m_13s # Date time of inference.
├── mpnn.fasta # mpnn designed seuences.
├── self_consistency.csv # self consistency analysis, contains rmsd and tmscore between scaffold ans esmfold, mpnn score of sequence, scaffold path, esmf path etc.
├── diffusion # dir contains scaffold generated by proteus
│ ├── 100_1_sample.pdb
│ ├── 100_2_sample.pdb # {length}_{sample_id}_sample.pdb
| └── ...
├── trajctory # dir contains traj pdb, exists when inference.diffusion.option.save_trajactory=True
│ ├── 100_1_bb_traj.pdb
│ ├── 100_2_bb_traj.pdb # {length}_{sample_id}_traj.pdb
| └── ...
├── movie # dir contains full atom protein designed by mpnn, exists when inference.diffusion.option.plot.switch_on=True
│ ├── 100_1_rigid_movie.gif # movie of protein rigid at time t
│ ├── 100_1_rigid_0_movie.gif # movie of predict protein rigid at time 0 from time t
| └── ...
├── mpnn # dir exists when pyrosetta in installed and inference.mpnn.dump=True
│ ├── 100_0_sample_mpnn_0.pdb
│ ├── 100_0_sample_mpnn_1.pdb # {length}_{sample_id}_sample_mpnn_{sequence_id}.pdb
| └── ...
└── esmf # dir contians esmf predict strcture
├── 100_0_sample_esmf_0.pdb
├── 100_0_sample_esmf_0.pdb # {length}_{sample_id}_sample_esmf_{sequence_id}.pdb
└── ...
Code Structure
The local triangle attention is implemented below:
License
LICENSE: MIT
Citation
If you use our work then please cite
@article{wang2024proteus,
title={Proteus: exploring protein structure generation for enhanced designability and efficiency},
author={Wang, Chentong and Qu, Yannan and Peng, Zhangzhi and Wang, Yukai and Zhu, Hongli and Chen, Dachuan and Cao, Longxing},
journal={bioRxiv},
pages={2024--02},
year={2024},
publisher={Cold Spring Harbor Laboratory}
}
Appreciation
Proteus is built upon the following codebases, please give them a star if you enjoy Proteus :)