Home

Awesome

IdpSAM: latent diffusion model for peptide conformation generation

About

Repository implementing idpSAM in PyTorch. IpdSAM is a latent diffusion model for generating Cα conformations of intrinsically disordered proteins (IDPs) and peptides. The model was trained on a dataset of Markov Chain Monte Carlo simulations of 3,259 intrinsically disordered regions whose sequences were obtained from the DisProt database. The simulations were carried out using ABSINTH, an implicit solvent model, implemented in the CAMPARI 4.0 package. Here we provide code and weights of a pre-trained idpSAM model.

Applications

This repository can be used for the following applications (see below for more information):

Installation

Local system

We recommend to install and run this package in a new Conda environment that you create from the sam.yml file in this repository. If you follow this strategy, use these commands:

  1. Clone the repository:
    git clone https://github.com/giacomo-janson/idpsam.git
    
    and go into the root directory of the repository.
  2. Install the dedicated conda environment and dependencies:
    conda env create -f sam.yml
    
  3. Activate the environment:
    conda activate sam
    
  4. Install the sam Python library in editable mode (it will just put the library in $PYTHONPATH):
    pip install -e .
    
  5. Optional, only if you want to perform all-atom reconstruction when using the idpSAM inference script. Install the cg2all package inside the sam environment created above:
    pip install git+http://github.com/huhlim/cg2all
    
    Note: this is the command for performing a CPU-only installation of cg2all. You can also attempt the GPU installation, which involves more steps. If you can't install cg2all with GPU support, the CPU installation is still good for idpSAM applications. This is because for short peptides cg2all is reasonably fast when running on a CPU.

Run on the cloud

If you want to quickly use idpSAM on the cloud (no installations needed on your system), we have a idpSAM Colab notebook.

Usage

Generate conformational ensembles

Running locally

You can generate a structural ensemble of a custom peptide sequence via the scripts/generate_ensemble.py inference script. Its usage is:

python scripts/generate_ensemble.py -c config/models.yaml -s MFDNASTRNNKRERGKRQGKQTRTQRHADRSQT -o peptide -n 1000 -a -d cuda

Here is a description of the arguments:

There are also other options that you can tweak. Use the --help flag to get the full list of them.

Running remotely

You can easily generate a Cα (and optionally all-atom) ensemble for a custom peptide using a Colab notebook on the cloud and download the ensemble on your local system. The output will consists of DCD files, that you can parse with MDTraj for example. If you plan to generate large ensembles (> 1000 conformations), it will probably take hours of time if using a CPU runtime. If possible, use a GPU runtime to accelerate (few minutes of time) idpSAM.

Launch the notebook using the link below:

Google Colab

Updates

References

Janson G and Feig M. Transferable deep generative modeling of intrinsically disordered protein conformations. BioRxiv (2024).

DOI