Awesome
Scaffold-Lab: A Unified Framework for Evaluating Protein Backbone Generation Methods
Official implementation for Scaffold-Lab: Critical Evaluation and Ranking of Protein Backbone Generation Methods in A Unified Framework.
Description
Scaffold-Lab is the first unified framework for evaluating different protein backbone generation methods.
We present the benchmark for both unconditional generation and conditional generation in terms of designability, diversity, novelty, efficiency and structural properties. Currently evaluated methods are listed below:
Unconditional Generation
- RFdiffusion: Paper | Code
- Chroma: Paper | Code
- FrameDiff: Paper | Code
- FrameFlow: Paper | Code
- Genie: Paper | Code
Conditional Generation
Updates
- July 26th, 2024: A guideline for designing protein from scratch using different baseline methods is updated here. We expect this as a reference for both reproduction and running methods benchmarked by our work with minimal efforts.
- July 19th, 2024: We now enable motif positions to be partially redesigned with ProteinMPNN. Check out here to see the way of specification.
- June 19th, 2024 : Scaffold-Lab now supports AlphaFold2 for evaluation! The implementation of AF2 is built upon LocalColabFold. We refer interested users to here for more details.
[!NOTE]
This is a beta version which has not been tested thoroughly. Bug reports and pull requests are especially welcomed.
You can also try our notebook in colab:
<img src="https://colab.research.google.com/assets/colab-badge.svg">
Table of Contents
Installation
We recommend using Conda to set up dependencies. To quickly set up an environment, just simply run:
# Clone this repository and set up virtual environment
git clone https://github.com/Immortals-33/Scaffold-Lab.git
cd Scaffold-Lab
# Create and activate environment
conda env create -f scaffold-lab.yml
source activate scaffold-lab
You may also need to build a Foldseek database for diversity and novelty evaluation.
Within the conda environment, run:
mkdir <foldseek_pdb_database_path>
cd <foldseek_pdb_database_path>
foldseek databases PDB pdb tmp
After successfully building a PDB database of Foldseek, you can save the <foldseek_pdb_database_path>
as a record and lately specify it your foldseek database path either using config or directly by command-line usage, whose demo is provided below.
Outline
Here is a guide about how you can go through this repository. We aim to provide an easy-to-use evaluation pipeline as well as maximize the utility of individual scripts. Let's go through the structure of this repository as a start:
-
scaffold_lab
: This is the main directory to run different evaluations described in our paper. -
analysis
: Scripts for calculating several metrics, including diversity, novelty and structural properties. -
baselines
: In order to generate protein backbones directly inside this repository, you may find the code of different methods baselines for unconditional generation and conditional generation then clone their repository under this content. it is highly recommended to run inference for different baselines inside their own virtual environment for potential conflicts of environmental dependencies.- Inside the
experiment
folder we provide scripts for performing motif-scaffolding experiments by Chroma using itsSubstrctureConditioner
. Refer the script for detailed information if you want.
- Inside the
-
config
: We place different configuration settings of Hydra here to organize for evaluations. Hydra is a hierarchical configuration framework to help users systematize different experimental settings. Though it might be confusing when you first get in touch with it, it is a powerful tool to help you perform experiments efficiently with different combinations of parameters, for example, the number of sequences to generate. We recommend readers to Docs for advanced usage.
Usage
Unconditional Generation
Let's start by running a simple evaluation here:
python scaffold_lab/unconditional/refolding.py
This performs a simple refolding analysis for the proteins we put inside demo/unconditional/
.
Conditional Generation (Motif-scaffolding)
To run a minimal version on motif-scaffolding task, simply run:
python scaffold_lab/motif_scaffolding/motif_refolding.py evaluation.foldseek_database=<foldseek_pdb_database_path> # Specify the path of your Foldseek database directly
This performs a evaluation on demo/motif_scaffolding/2KL8/
where the outputs would be saved under outputs/2KL8/
.
Customize Methods for Structure Prediction
We support both AlphaFold2 (single-sequence version) and ESMFold for structure prediction during refolding.
ESMFold
Scaffold-Lab performs evaluation using ESMFold by default. Once you set up the environment this should work.
AlphaFold2 (single-chain version)
The implementation of AlphaFold2 is based on LocalColabFold, which is a local version of ColabFold. We provide a brief guideline for enabling using AlphaFold2 during evaluation:
-
Install LocalColabFold. Please follow the installation guide on its official page based on your specific OS. Note that it might take a few tries for a complete installation.
-
Export executable ColabFold into your PATH. This enables the running of ColabFold during the refolding pipeline. Suppose the root directory of your LocalColabFold is
{LocalColabFold}
, then you can export variable PATH in two ways:-
Set up inside the config (Recommended). Specifically, two ways to do so:
-
Inside
config/unconditional.yaml
andconfig/motif_scaffolding.yaml
(Recommended):inference: af2: executive_colabfold_path: {LocalColabFold}/colabfold-conda/bin # Replace {LocalColabFold} by your actual path of LocalColabFold
-
Alternatively, set this in a command-line way:
python scaffold_lab/unconditional/refolding.py inference.af2.executive_colabfold_path='{LocalColabFold}/colabfold-conda-bin'
-
-
Direct set variable PATH before running evaluation script, which is similarily done in #5 inside this guide.
-
-
Set AlphaFold2 as your forward folding method when running evaluation. Inside the config:
inference: ... predict_method: [AlphaFold2] # Only run AF2 for evaluation predict_method: [AlphaFold2, ESMFold] # Run both AF2 and ESMFold for evaluation ...
And voilĂ !
Contact
Citation
If you use Scaffold-Lab in your research or find it helpful, please cite:
@article{zheng2024scaffoldlab,
title = {Scaffold-Lab: Critical Evaluation and Ranking of Protein Backbone Generation Methods in A Unified Framework},
author = {Zheng, Zhuoqi and Zhang, Bo and Zhong, Bozitao and Liu, Kexin and Li, Zhengxin and Zhu, Junjie and Yu, Jinyu and Wei Ting and Chen, Haifeng},
year = {2024},
journal = {bioRxiv},
url = {https://www.biorxiv.org/content/10.1101/2024.02.10.579743v3}
}
Acknowledgments
Open-source Projects
This codebase benefits a lot from FrameDiff, OpenFold, ProteinMPNN and some other amazing open-source projects. Take a look at their work if you find Scaffold-Lab is helpful!
Individuals
We thank the following ones for contributing or pointing out potential bugs for improvements: