Home

Awesome

P(allatom): A New Path for Protein Design

Overview

Pallatom is an innovative protein generation model that produces protein structures with all-atom coordinates. By learning and modeling the joint distribution $P(\text{structure}, \text{seq})$, with a focus on $P(\text{all-atom})$, Pallatom effectively addresses the interdependence between sequence and structure in protein generation. This project introduces a novel network architecture designed specifically for all-atom protein generation, employing a dual-track framework that tokenizes proteins into token-level and atomic-level representations. Pallatom excels in key metrics of protein design, including designability, diversity, and novelty, paving the way for future applications in more complex systems.

Installation

To set up the environment for running Pallatom, follow these steps:

  1. Create and activate a conda environment:

    conda create --name pallatom python=3.7.16
    conda activate pallatom
    
  2. Install JAX:

    First, install the specific version of JAX needed for this project:

    pip install jax==0.3.25
    pip install "jax[cuda]"==0.3.25 -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
    
  3. Install other dependencies:

    Finally, install the additional required packages from requirements.txt:

    pip install -r requirements.txt
    

If you encounter compatibility issues with higher CUDA versions, JAX 0.3.25, and Python 3.7, we offer the following solution using Python 3.10 and JAX with CUDA 12.6:

Create and activate a conda environment:

conda create --name pallatom python=3.10
conda activate pallatom

Install basic dependencies:

pip install biopython==1.79 dm-tree==0.1.8 chex==0.1.86 dm-haiku==0.0.12 dm-tree==0.1.8 immutabledict==2.0.0 ml-collections==0.1.0 numpy==1.24.3 pandas==2.0.3 scipy==1.11.1 tensorflow-cpu==2.16.1 rdkit einops tqdm

Install JAX with CUDA support:

pip install "jax[cuda]"==0.4.34 -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html

Usage

To run the Pallatom model sampling process, use the pallatom.py script. Below is an example of how to use the script with command-line arguments:

python pallatom.py --savepath ./results --L 100 --cuda_devices 0 --t_min 0.01 --t_max 1.0 --gamma 0.2 --step_scale 2.25 --T 200 --rounds 10

Parameters:

Output

The results, including the generated sequences in FASTA format and protein structures in PDB format, will be saved in the specified savepath directory.

Citation

If you find Pallatom useful in your research, please consider citing our work:

@article {Qu2024.08.16.608235,
	author = {Qu, Wei and Guan, Jiawei and Ma, Rui and Zhai, Ke and Wu, Weikun and Wang, Haobo},
	title = {P(all-atom) Is Unlocking New Path For Protein Design},
	year = {2024},
	doi = {10.1101/2024.08.16.608235},
	journal = {bioRxiv}
}

Copyright and License

Shield: CC BY-NC-SA 4.0

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

CC BY-NC-SA 4.0