Home

Awesome

Integrative protein sequence design with evolutionary multiobjective optimization

Code and benchmark data associated with the paper "An integrative approach to protein sequence design through multiobjective optimization".

TL;DR

Motivation

This package provides a demonstration of how evolutionary multiobjective optimization techniques can be used to coherently integrate multiple models into the computational protein sequence design process, by 1) directly embedding models into the mutation operator to bias sampling in the sequence space, and 2) explicitly approximating the Pareto front in a user-specified objective space. The main advantage of this approach is that it outperforms and obviates the need for post hoc filtering in a multiobjective protein design problem; we anticipate this approach to be broadly relevant for problems with complex design specifications that cannot be easily encapsulated by a single model or objective function.

Getting started with the repo

Clone the repo and pip install . from the repo root directory to install the package. Take a look at the RfaH benchmark code in RfaH_benchmark/ and the docstrings in __init__.py, which provides the primary user interface for setting up a simulation.

Installation

Besides pip install, the repo can also be packaged into a .whl file using python -m build --wheel from the root directory.

Optional dependencies

Note that this repo contains a vendorized version of ProteinMPNN (version 1.0.1) and AF2Rank.

Debugging mode

Change the line logger.setLevel(logging.WARN) in utils.get_logger() to logger.setLevel(logging.DEBUG) to print out debugging information.

Parallelization

As long as torch and jax are properly configured, the code should automatically detect and utilize available GPUs. To force CPU computation, set the device argument to cpu for the relevant wrapper objects.

The code supports three modes of parallelization:

Caveats

Benchmarks

The genetic algorithms in this repo have been benchmarked against three model systems: RfaH, PapD, and CaM. See the benchmarks folder for more information on the data and code for the benchmark anlaysis of each of these model systems.