Awesome
Protein holography
Overview
The protein holography package implements efficient rotationally-equivariant encoding of protein structure and minimal rotationally-equivariant processing of protein microenvironements via H-CNN.
Installation
pyRosetta
This package is dependent on pyrosetta which can be downloaded from here. A license is available at no cost to academics and can be obtained here.
The env.yml file should be edited upon download with the local path to the wheel file to install.
setup
Once the pyrosetta wheel file has been downloaded and the path has been specified in the env.yml file, one can create the protein holography conda environment by running
conda env create -f env.yml
to install the necessary dependencies. Then run
pip install .
to install the protein_holography
package.
If you're going to make edits to the protein_holography
package, run
pip install -e .
so you can test your changes.
Testing install
The installation can be tested by running pytest tests
. Currently only the preprocessing pipeline is tested. Testing will be implemented soon for the network.
Quick run
A bash script for complete processing of pdb files is located in scripts
. This script requires simply a csv file with protein pdb IDs and, in addition to intermediate outputs, will produce predicted pseudoenergies and probabilities for all sites in a protein as well as the protein network energy for all chains in the proteins. See scripts
for an example and more details.
Detailed overview
Components
pdb preprocessing module
The pdb preprocessing module filters pdbs by criteria such as imaging type (e.g., X-ray crystallography, cryo-EM, etc.), resolution, date of deposition, or any other metadata deposited with the structure.
Coordinates
The coordinate module features all preprocessing of pdb files and ultimately results in the holograms that are used in the H-CNN. Specifically, pdb files are processed in three steps.
chemical inference and coordinate extraction via PyRosetta
First, pdb files are read into pyrosetta where hydrogen atom positions are inferred, partial charges are assigned, and solvent-accessible surface area (SASA) is calculated on a per atom basis.
neighborhood segmentation
Second, neighborhoods of a fixed radius are extracted from each structure.
holographic projection
Third, each neighborhood is projected into Fourier space via the 3D zernike polynomials.
H-CNN
The hnn class is a fully fourier neural network coded in tensorflow and operates on fully complex inputs.