Awesome

PENSA - Python Ensemble Analysis

A collection of Python methods for exploratory analysis and comparison of biomolecular conformational ensembles, e.g., from molecular dynamics simulations. All functionality is available as a Python package.

To get started, see the documentation which includes a tutorial for the PENSA library, or read our preprint.

If you would like to contribute, check out our contribution guidelines and our to-do list.

Functionality

With PENSA, you can (currently):

compare structural ensembles of biomolecules (proteins, DNA or RNA) via the relative entropy of their features or statistical tests and visualize deviations on a reference structure.
project several ensembles on a joint reduced representation using principal component analysis (PCA) or time-lagged independent component analysis (tICA) and sort the structures along the obtained components.
cluster structures across ensembles via k-means or regular-space clustering and write out the resulting clusters as trajectories.
trace allosteric information flow through a protein using state-specific information analysis methods.

Biomolecules can be featurized using backbone torsions, sidechain torsions, or arbitrary distances (e.g., between all backbone C-alpha atoms). We also provide density-based methods to featurize water and ion pockets as well as a featurizer for hydrogen bonds. The library is modular so you can easily write your own feature reader.

PENSA also includes trajectory processing tools based on MDAnalysis and plotting functions using Matplotlib.

Documentation

PENSA's documentation pages are here, where you find installation instructions, API documentation, and a tutorial.

Example Scripts

For the most common applications, example Python scripts are provided. We show how to run the example scripts in a short separate tutorial.

Citation

General citation, representing the "concept" of the software:

Martin Vögele, Neil Thomson, Sang Truong, Jasper McAvity. (2021). PENSA. Zenodo. http://doi.org/10.5281/zenodo.4362136

To get the citation and DOI for a particular version, see Zenodo.

Please also consider citing our our preprint:

Systematic Analysis of Biomolecular Conformational Ensembles with PENSA
M. Vögele, N. J. Thomson, S. T. Truong, J. McAvity, U. Zachariae, R. O. Dror
arXiv:2212.02714 [q-bio.BM] 2022

Acknowledgments

Contributors

Martin Vögele, Neil Thomson, Sang Truong, Jasper McAvity

Beta-Testers

Alexander Powers, Lukas Stelzl, Nicole Ong, Eleanore Ocana, Emma Andrick, Callum Ives, Bu Tran, and Luca Morlok

Funding & Support

This project was started by Martin Vögele at Stanford University, supported by an EMBO long-term fellowship (ALTF 235-2019), as part of the INCITE computing project 'Enabling the Design of Drugs that Achieve Good Effects Without Bad Ones' (BIP152). Neil Thomson was supported by a BBSRC EASTBIO PhD studentship and Jasper McAvity by the Stanford Computer Science department via the CURIS program. Stanford University, the Stanford Research Computing Facility, and the University of Dundee provided additional computational resources and support that contributed to these research results.