Home

Awesome

Scaffold-Lab: A Unified Framework for Evaluating Protein Backbone Generation Methods


Official implementation for Scaffold-Lab: Critical Evaluation and Ranking of Protein Backbone Generation Methods in A Unified Framework.

Description

Scaffold-Lab is the first unified framework for evaluating different protein backbone generation methods.

We present the benchmark for both unconditional generation and conditional generation in terms of designability, diversity, novelty, efficiency and structural properties. Currently evaluated methods are listed below:

Unconditional Generation

Conditional Generation


Updates

[!NOTE]

This is a beta version which has not been tested thoroughly. Bug reports and pull requests are especially welcomed.

You can also try our notebook in colab:

<img src="https://colab.research.google.com/assets/colab-badge.svg">


Table of Contents


Installation

To quickly set up an environment, just simply run:

# Clone this repository and set up environment
git clone https://github.com/Immortals-33/Scaffold-Lab.git
cd Scaffold-Lab
conda env create -f scaffold-lab.yml
source activate scaffold-lab
# You may also need to install some dependencies manually in certain cases
pip install hydra-core --upgrade
pip install hydra-joblib-launcher --upgrade
pip install ml-collections GPUtil hjson h5py

You may also need to build a Foldseek database for diversity and novelty calculation.

Within the conda environment, run:

mkdir <foldseek_pdb_database_path>
cd <foldseek_pdb_database_path>
foldseek databases PDB pdb tmp

After successfully building a PDB database of Foldseek, you can save the <foldseek_pdb_database_path> as a record and lately specify it your foldseek database path either using config or directly by command-line usage, whose demo is provided below.


Outline

Here is a guide about how you can go through this repository. We aim to provide an easy-to-use evaluation pipeline as well as maximize the utility of individual scripts. Let's go through the structure of this repository as a start:


Usage

Unconditional Generation

Let's start by running a simple evaluation here:

python scaffold_lab/unconditional/refolding.py 

This performs a simple refolding analysis for the proteins we put inside demo/unconditional/.


Conditional Generation (Motif-scaffolding)

To run a minimal version on motif-scaffolding task, simply run:

python scaffold_lab/motif_scaffolding/motif_refolding.py evaluation.foldseek_database=<foldseek_pdb_database_path> # Specify the path of your Foldseek database directly

This performs a evaluation on demo/motif_scaffolding/2KL8/ where the outputs would be saved under outputs/2KL8/.


Customize Methods for Structure Prediction

We support both AlphaFold2 (single-sequence version) and ESMFold for structure prediction during refolding.

ESMFold

Scaffold-Lab performs evaluation using ESMFold by default. Once you set up the environment this should work.

AlphaFold2 (single-chain version)

The implementation of AlphaFold2 is based on LocalColabFold, which is a local version of ColabFold. We provide a brief guideline for enabling using AlphaFold2 during evaluation:

And voilĂ !


Contact


Citation

If you use Scaffold-Lab in your research or find it helpful, please cite:

@article{zheng2024scaffoldlab,
title = {Scaffold-Lab: Critical Evaluation and Ranking of Protein Backbone Generation Methods in A Unified Framework},
author = {Zheng, Zhuoqi and Zhang, Bo and Zhong, Bozitao and Liu, Kexin and Li, Zhengxin and Zhu, Junjie and Yu, Jinyu and Wei Ting and Chen, Haifeng},
year = {2024},
journal = {bioRxiv},
url = {https://www.biorxiv.org/content/10.1101/2024.02.10.579743v3}
}

Acknowledgments

Open-source Projects

This codebase benefits a lot from FrameDiff, OpenFold, ProteinMPNN and some other amazing open-source projects. Take a look at their work if you find Scaffold-Lab is helpful!

Individuals

We thank the following ones for contributing or pointing out potential bugs for improvements: