Awesome
PTMPrediction
Supplementary scripts for the paper "Combining machine learning with structure-based protein design to predict and engineer post-translational modifications of protein therapeutics"
Rosetta documentation
Documentation for the Rosetta SimpleMetric can be found at PTMPredictionMetric. All models are included in Rosetta and can be used through the RosettaScripts XML interface.
Scripts for paper reproduction
Setup
A conda build with the required libraries can be created with build_conda.sh
.
For compiling Rosetta with Tensorflow, see information on this page.
Training
To train models from scratch use either the training.py
or the multi_training.py
scripts, e.g. python ./training.py -p NlinkedGlycosylation
.
For training the ptm_data.csv.gz found in ./data/
needs to be uncompressed first.
Feature calculation
A function for calculating the features used is in calc_features.py
(requires PyRosetta to be installed in the conda environment).
Deamidation Prediction
PDB files can be found in ./data
and deamidation probabilities can be calculated with ./deamidation.sh
which uses the ./XML/deamidation.xml
protocol.
Influenza Prediction and Design
For designing the aquired glycosylation sites, use the ./influenza_design.sh
script. PDB files can be found in data and glycosylation probabilities can be calculated with ./influenza.sh
which uses the ./XML/influenza.xml
protocol.
Phosphorylation engineering
In oder to run the Monte Carlo optimization use the run_phospho_opt.sh
script. The input pdb structure is in ./data
and the RosettaScript protocol in ./XML
.
Models
All trained models can be found in ./models/
.