Home

Awesome

ProteinInvBench: Benchmarking Protein Design on Diverse Tasks, Models, and Metrics

Model zoom: https://zenodo.org/record/8031783

<p align="left"> <!-- <a href="https://arxiv.org/abs/2211.12509" alt="arXiv"> <img src="https://img.shields.io/badge/arXiv-2211.12509-b31b1b.svg?style=flat" /></a> --> <a href="https://github.com/A4Bio/OpenCPD/blob/release/LICENSE" alt="license"> <img src="https://img.shields.io/badge/license-Apache--2.0-%23002FA7" /></a> <!-- <a href="https://openstl.readthedocs.io/en/latest/" alt="docs"> <img src="https://readthedocs.org/projects/openstl/badge/?version=latest" /></a> --> <a href="https://github.com/A4Bio/OpenCPD/issues" alt="docs"> <img src="https://img.shields.io/github/issues-raw/A4Bio/OpenCPD?color=%23FF9600" /></a> <a href="https://github.com/A4Bio/OpenCPD/issues" alt="resolution"> <img src="https://img.shields.io/badge/issue%20resolution-1%20d-%23B7A800" /></a> <a href="https://img.shields.io/github/stars/A4Bio/OpenCPD/" alt="arXiv"> <img src="https://img.shields.io/github/stars/A4Bio/OpenCPD" /></a> </p>

This repository is an open-source project for benchmarking structure-based protein design methods, which provides a variety of collated datasets, reprouduced methods, novel evaluation metrics, and fine-tuned models that are all integrated into one unified framework. It also contains the implementation code for the paper:

ProteinInvBench: Benchmarking Protein Design on Diverse Tasks, Models, and Metrics

Zhangyang Gao, Cheng Tan, Yijie Zhang, Xingran Chen, Stan Z. Li.

Introduction

ProteinInvBench is the first comprehensive benchmark for protein design. The main contributions of our paper could be listed as four points below:

<p align="center"> <img width="75%" src=https://s1.ax1x.com/2023/06/14/pCnlp9K.jpg> <br> </p> <p align="right">(<a href="#top">back to top</a>)</p> <!-- <p align="center"> <img width="75%" src=https://github.com/A4Bio/OpenCPD/blob/release/assets/CATH.png> <br> </p> -->

Overview

<details open> <summary>Major Features</summary> </details> <details open> <summary>Code Structures</summary> </details> <details open> <summary>Demo Results</summary> The result of methods collected on CATH dataset is listed as following: <p align="center"> <img width="100%" src=https://s1.ax1x.com/2023/06/19/pC1r6ts.png> <br> </p> <p align="right">(<a href="#top">back to top</a>)</p> </details>

News and Updates

[2023-06-16] ProteinInvBench v0.1.0 is released.

Installation

This project has provided an environment setting file of conda, users can easily reproduce the environment by the following commands:

git clone https://github.com/A4Bio/OpenCPD.git
cd opencpd
conda env create -f environment.yml
conda activate opencpd
python setup.py develop

Getting Started

Obtaining Dataset

The processed datasets could be found in the releases. Or it can be directly downloaded here.

To note that, due to the large file size, ProteinMPNN dataset was uploaded in a separate file named mpnn.tar.gz, others could be found in data.tar.gz

Model Training

python main.py --method {method} 
<p align="right">(<a href="#top">back to top</a>)</p>

Overview of Supported Models, Datasets, and Evaluation Metrics

We support various protein design methods and will provide benchmarks on various protein datasets. We are working on adding new methods and collecting experiment results.

<!-- *The detailed introduction could be found in* [dataset.md]() --> <p align="right">(<a href="#top">back to top</a>)</p>

License

This project is released under the Apache 2.0 license. See LICENSE for more information.

Acknowledgement

ProteinInvBench is an open-source project for structure-based protein design methods created by researchers in CAIRI AI Lab. We encourage researchers interested in protein design and other related fields to contribute to this project!

Citation

@inproceedings{
gao2023proteininvbench,
title={ProteinInvBench: Benchmarking Protein Inverse Folding on Diverse Tasks, Models, and Metrics},
author={Zhangyang Gao and Cheng Tan and Yijie Zhang and Xingran Chen and Lirong Wu and Stan Z. Li},
booktitle={Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track},
year={2023},
url={https://openreview.net/forum?id=bqXduvuW5E}
}

Contribution and Contact

For adding new features, looking for helps, or reporting bugs associated with ProteinInvBench, please open a GitHub issue and pull request with the tag "new features", "help wanted", or "enhancement". Feel free to contact us through email if you have any questions.

<p align="right">(<a href="#top">back to top</a>)</p>

TODO

  1. Switch code to torch_lightning framework
  2. Deploy code to public server
  3. Support pip installation