Awesome
pmpnndiff
Fast Non-autoregressive Inverse Folding with Discrete Diffusion (NeurIPS MLSB 2023)
This repository offers an implementation of discrete diffusion for inverse protein folding. It has pre-trained models, training routines, and inference scripts, ensuring a streamlined experience for protein folding predictions and experiments.
To-do List:
- Provide code of designability metric for proper reproducibility.
- Configure PMPNN ARM sampling temperature correctly.
1. Installation
Create and Activate Conda Environment
Clone the repository, navigate to its root directory, and create a conda environment using the provided YAML file. Activate the environment as follows:
conda env create -f environment.yml
conda activate your-env-name
Install Package Dependencies
Within the activated environment and the root directory of the repository, execute:
pip install -e .
2. Inference
Running Inference
For discrete diffusion inference with purity sampling, run
python experiments/inference_diff.py --sampling_type purity_sample
Refer to configs/clean/inference_diff.yaml
for a complete description of inference args.
Compute Designability Numbers using ESMFold
To compute designability numbers, run
python scripts/run_esmfold_csv.py --csv_path your-csv-path ...
passing in your CSV path generated by inference_diff.py
.
3. Weights
We provide pretrained weights for ProteinMPNN trained on the CATH 4.2 dataset under the weights
directory. Both ARM and Discrete Diffusion weights are available.
4. Training Models
To train an ARM model from scratch, run
python experiments/train_arm.py ...
To train a discrete diffusion model from scratch, run
python experiments/train_diff.py ...
Contact
Please reach out to johnyang@mit.edu
with any questions or concerns.
License
This project is endorsed under the MIT License - refer to the LICENSE.md file for details.