Home

Awesome

alphafold_finetune

Python code for fine-tuning AlphaFold to perform protein-peptide binding predictions. This repository is a collaborative effort: Justas Dauparas implemented the AlphaFold changes necessary for fine-tuning and wrote a template of the fine-tuning script. Amir Motmaen and Phil Bradley further developed and extensively tested the fine-tuning and inference scripts in the context of protein-peptide binding.

This repository is still under development. Feel free to reach out with questions, comments, or other feedback. You can open a github issue or email pbradley at fredhutch.org.

UPDATE: We have uploaded a preliminary dataset with the fine-tuned parameters and the training and testing datasets here:

https://files.ipd.uw.edu/pub/alphafold_finetune_motmaen_pnas_2023/datasets_alphafold_finetune_v2_2023-02-20.tgz

Once you download the .tgz file, copy it into the alphafold_finetune/ folder and uncompress it, something like

tar -xzvf datasets_alphafold_finetune_v2_2023-02-20.tgz

That should create a new folder called datasets_alphafold_finetune/ and hopefully the relevant examples will work. Let us know if you run into trouble or would like any other data from the PNAS paper (Motmaen et al PNAS, https://www.pnas.org/doi/abs/10.1073/pnas.2216697120).

[NEW] Google colab notebook that gives examples of installing the software and running fine-tuning and binding predictions. open in colab

Examples

Fine-tuning for peptide-MHC on a tiny dataset

Here $ALPHAFOLD_DATA_DIR should point to a directory that contains the AlphaFold2 params/ subdirectory, which should in turn contain files like params_model_2_ptm.npz.

python run_finetuning.py \
    --data_dir $ALPHAFOLD_DATA_DIR \
    --binder_intercepts 0.80367635 --binder_intercepts 0.43373787  \
    --freeze_binder  \
    --train_dataset examples/tiny_pmhc_finetune/tiny_example_train.tsv \
    --valid_dataset examples/tiny_pmhc_finetune/tiny_example_valid.tsv

Fine-tuning peptide-MHC (full model)

For this, please download the companion dataset on Zenodo (actually, the Zenodo upload is not working yet, see above for the dropbox link to the preliminary dataset).

python run_finetuning.py \
    --data_dir $ALPHAFOLD_DATA_DIR \
    --binder_intercepts 0.80367635 --binder_intercepts 0.43373787 \
    --freeze_binder  \
    --train_dataset datasets_alphafold_finetune/pmhc_finetune/combo_1and2_train.tsv \
    --valid_dataset datasets_alphafold_finetune/pmhc_finetune/combo_1and2_valid.tsv

Running predictions of peptide binding

HLA-A*02:01 10mer scan, with default alphafold params. Here $ALPHAFOLD_DATA_DIR should point to a directory that contains the params/ subdirectory.

python run_prediction.py --targets examples/pmhc_hcv_polg_10mers/targets.tsv \
    --data_dir $ALPHAFOLD_DATA_DIR --outfile_prefix polg_test1 \
    --model_names model_2_ptm --ignore_identities

HLA-A*02:01 10mer scan with fine-tuned params

python run_prediction.py --targets examples/pmhc_hcv_polg_10mers/targets.tsv \
    --outfile_prefix polg_test2 --model_names model_2_ptm_ft \
    --model_params_files datasets_alphafold_finetune/params/mixed_mhc_pae_run6_af_mhc_params_20640.pkl \
    --ignore_identities

Model 10 random peptides per target for 17 PDZ domains, with default params

python run_prediction.py --targets examples/pdz/pdz_10_random_peptides.tsv \
    --data_dir $ALPHAFOLD_DATA_DIR --outfile_prefix pdz_test1 \
    --model_names model_2_ptm --ignore_identities

Model 10 random peptides per target/class for 23 SH3 domains, with default model_2_ptm AND fine-tuned params. Here we pass multiple values for --model_names and --model_params_files, and the values should correspond 1:1 in order. We give the string 'classic' in place of the parameter filename for the non-fine-tuned parameters, which is the signal to load default parameters from the params/ folder in $ALPHAFOLD_DATA_DIR.

python run_prediction.py --targets examples/sh3/sh3_10_random_peptides.tsv \
    --ignore_identities --outfile_prefix sh3_test1 \
    --data_dir $ALPHAFOLD_DATA_DIR \
    --model_names model_2_ptm model_2_ptm_ft \
    --model_params_files classic datasets_alphafold_finetune/params/mixed_mhc_pae_run6_af_mhc_params_20640.pkl

File formats

Inputs

targets files

Files with lists of modeling targets (for run_prediction.py) or training examples (for run_finetuning.py) should be formatted as tab-separated values files. See examples in examples/*/*tsv. The required fields are

For fine-tuning, these additional fields are required:

Optional arguments:

alignment files

These tab-separated values files provide information on the alphafold modeling template structures and their alignments to the target sequence. See examples/*/alignments/* for examples. The required fields are