Home

Awesome

cg2all

Convert coarse-grained protein structure to all-atom model

Web server / Google Colab notebook

Hugging Face Spaces</br> A demo web page is available for conversions of CG model to all-atom structure via Huggingface space.</br>

Google Colab</br> A Google Colab notebook is available for tasks:

Google Colab</br> A Google Colab notebook is available for local optimization of a protein model structure against a cryo-EM density map using cryo_em_minimizer.py

Installation

These steps will install Python libraries including cg2all (this repository), a modified MDTraj, a modified SE3Transformer, and other dependent libraries. The installation steps also place executables convert_cg2all and convert_all2cg in your python binary directory.

This package is tested on Linux (CentOS) and MacOS (Apple Silicon, M1).

for CPU only

pip install git+http://github.com/huhlim/cg2all

for CUDA (GPU) usage

  1. Install Miniconda
  2. Create an environment with DGL library with CUDA support
# This is an example with cudatoolkit=11.3.
# Set a proper cudatoolkit version that is compatible with your CUDA drivier and DGL library.
# dgl>=1.1 occassionally raises some errors, so please use dgl<=1.0.
conda create --name cg2all pip cudatoolkit=11.3 dgl=1.0 -c dglteam/label/cu113
  1. Activate the environment
conda activate cg2all
  1. Install this package
pip install git+http://github.com/huhlim/cg2all

for cryo_em_minimizer usage

You need additional python package, mrcfile to deal with cryo-EM density map.

pip install mrcfile

Usages

convert_cg2all

convert a coarse-grained protein structure to all-atom model

usage: convert_cg2all [-h] -p IN_PDB_FN [-d IN_DCD_FN] -o OUT_FN [-opdb OUTPDB_FN]
                      [--cg {supported_cg_models}] [--chain-break-cutoff CHAIN_BREAK_CUTOFF] [-a]
                      [--fix] [--ckpt CKPT_FN] [--time TIME_JSON] [--device DEVICE] [--batch BATCH_SIZE] [--proc N_PROC]

options:
  -h, --help            show this help message and exit
  -p IN_PDB_FN, --pdb IN_PDB_FN
  -d IN_DCD_FN, --dcd IN_DCD_FN
  -o OUT_FN, --out OUT_FN, --output OUT_FN
  -opdb OUTPDB_FN
  --cg {supported_cg_models}
  --chain-break-cutoff CHAIN_BREAK_CUTOFF
  -a, --all, --is_all
  --fix, --fix_atom
  --standard-name
  --ckpt CKPT_FN
  --time TIME_JSON
  --device DEVICE
  --batch BATCH_SIZE
  --proc N_PROC

arguments

examples

Conversion of a PDB file

convert_cg2all -p tests/1ab1_A.calpha.pdb -o tests/1ab1_A.calpha.all.pdb --cg CalphaBasedModel

Conversion of a DCD trajectory file

convert_cg2all -p tests/1jni.calpha.pdb -d tests/1jni.calpha.dcd -o tests/1jni.calpha.all.dcd --cg CalphaBasedModel

Conversion of a PDB file using a ckpt file

convert_cg2all -p tests/1ab1_A.calpha.pdb -o tests/1ab1_A.calpha.all.pdb --ckpt CalphaBasedModel-104.ckpt
<hr/>

convert_all2cg

convert an all-atom protein structure to coarse-grained model

usage: convert_all2cg [-h] -p IN_PDB_FN [-d IN_DCD_FN] -o OUT_FN [--cg {supported_cg_models}]

options:
  -h, --help            show this help message and exit
  -p IN_PDB_FN, --pdb IN_PDB_FN
  -d IN_DCD_FN, --dcd IN_DCD_FN
  -o OUT_FN, --out OUT_FN, --output OUT_FN
  --cg

arguments

an example

convert_all2cg -p tests/1ab1_A.pdb -o tests/1ab1_A.calpha.pdb --cg CalphaBasedModel
<hr/>

script/cryo_em_minimizer.py

Local optimization of protein model structure against given electron density map. This script is a proof-of-concept that utilizes cg2all network to optimize at CA-level resolution with objective functions in both atomistic and CA-level resolutions. It is highly recommended to use cuda environment.

usage: cryo_em_minimizer [-h] -p IN_PDB_FN -m IN_MAP_FN -o OUT_DIR [-a]
                         [-n N_STEP] [--freq OUTPUT_FREQ]
                         [--chain-break-cutoff CHAIN_BREAK_CUTOFF]
                         [--restraint RESTRAINT]
                         [--cg {CalphaBasedModel,CA,ca,ResidueBasedModel,RES,res}]
                         [--standard-name] [--uniform_restraint]
                         [--nonuniform_restraint] [--segment SEGMENT_S]

options:
  -h, --help            show this help message and exit
  -p IN_PDB_FN, --pdb IN_PDB_FN
  -m IN_MAP_FN, --map IN_MAP_FN
  -o OUT_DIR, --out OUT_DIR, --output OUT_DIR
  -a, --all, --is_all
  -n N_STEP, --step N_STEP
  --freq OUTPUT_FREQ, --output_freq OUTPUT_FREQ
  --chain-break-cutoff CHAIN_BREAK_CUTOFF
  --restraint RESTRAINT
  --cg {CalphaBasedModel,CA,ca,ResidueBasedModel,RES,res}
  --standard-name
  --uniform_restraint
  --nonuniform_restraint
  --segment SEGMENT_S

arguments

an example

./cg2all/script/cryo_em_minimizer.py -p tests/3isr.af2.pdb -m tests/3isr_5.mrc -o 3isr_5+3isr.af2 --all

Datasets

The training/validation/test sets are available at zenodo.

Reference

Lim Heo & Michael Feig, "One particle per residue is sufficient to describe all-atom protein structures", bioRxiv (2023). Link

DOI