Awesome

P(allatom): A New Path for Protein Design

Overview

Pallatom is an innovative protein generation model that produces protein structures with all-atom coordinates. By learning and modeling the joint distribution $P(\text{structure}, \text{seq})$, with a focus on $P(\text{all-atom})$, Pallatom effectively addresses the interdependence between sequence and structure in protein generation. This project introduces a novel network architecture designed specifically for all-atom protein generation, employing a dual-track framework that tokenizes proteins into token-level and atomic-level representations. Pallatom excels in key metrics of protein design, including designability, diversity, and novelty, paving the way for future applications in more complex systems.

Installation

To set up the environment for running Pallatom, follow these steps:

Create and activate a conda environment:

conda create --name pallatom python=3.7.16
conda activate pallatom

Install JAX:

First, install the specific version of JAX needed for this project:

pip install jax==0.3.25
pip install "jax[cuda]"==0.3.25 -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html

Install other dependencies:

Finally, install the additional required packages from requirements.txt:
```
pip install -r requirements.txt
```

If you encounter compatibility issues with higher CUDA versions, JAX 0.3.25, and Python 3.7, we offer the following solution using Python 3.10 and JAX with CUDA 12.6:

Create and activate a conda environment:

conda create --name pallatom python=3.10
conda activate pallatom

Install basic dependencies:

pip install biopython==1.79 dm-tree==0.1.8 chex==0.1.86 dm-haiku==0.0.12 dm-tree==0.1.8 immutabledict==2.0.0 ml-collections==0.1.0 numpy==1.24.3 pandas==2.0.3 scipy==1.11.1 tensorflow-cpu==2.16.1 rdkit einops tqdm

Install JAX with CUDA support:

pip install "jax[cuda]"==0.4.34 -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html

Usage

To run the Pallatom model sampling process, use the pallatom.py script. Below is an example of how to use the script with command-line arguments:

python pallatom.py --savepath ./results --L 100 --cuda_devices 0 --t_min 0.01 --t_max 1.0 --gamma 0.2 --step_scale 2.25 --T 200 --rounds 10

Parameters:

data_dir: Directory where model parameters are stored (default: ./)
model_name: Name of the model to use (default: Pallatom)
savepath: Directory where results will be saved (default: ./results)
L: Length of the sequence to sample (default: 120)
batch_num: Number of batches to run (default: 4)
cuda_devices: CUDA visible device (default: 0)
t_min: Minimum noise level for add_noise_level (default: 0.01)
t_max: Maximum noise level for add_noise_level (default: 1.0)
gamma: Gamma value for add_noise_level (default: 0.2)
step_scale: Scale of the step (default: 2.25)
T: Number of steps for the sampling process (default: 200)
rounds: Number of rounds to run (default: 1)

Output

The results, including the generated sequences in FASTA format and protein structures in PDB format, will be saved in the specified savepath directory.

Citation

If you find Pallatom useful in your research, please consider citing our work:

@article {Qu2024.08.16.608235,
	author = {Qu, Wei and Guan, Jiawei and Ma, Rui and Zhai, Ke and Wu, Weikun and Wang, Haobo},
	title = {P(all-atom) Is Unlocking New Path For Protein Design},
	year = {2024},
	doi = {10.1101/2024.08.16.608235},
	journal = {bioRxiv}
}

Copyright and License

Shield:

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.