Awesome
AbX: Antibody Design Using a Score-based Diffusion Model Guided by Evolutionary, Physical, and Geometric Constraints
T. Zhu, M. Ren, H. Zhang. Antibody Design Using a Score-based Diffusion Model Guided by Evolutionary. ICML 2024. <br> Link to Paper at ICML 2024
Installation
If you encounter any issues with the installation or would like to report a bug, please feel free to open an issue on GitHub at https://github.com/CarbonMatrixLab/AbX/issues.
Setting up the AbX Environment
To install AbX, it is recommended to create a Conda environment and install the necessary dependencies by following these steps:
git clone git@github.com:CarbonMatrixLab/AbX.git
conda env create -f environment.yml
PyRosetta is required to relax the generated structures and compute binding energy. Please refer to the installation guide provided here for further instructions.
Dataset Preparation
Antibody-antigen structures and associated summary files can be retrieved from the SAbDab database. The dataset and accompanying files can be downloaded from the following links:
Extract all_structures.zip
into the data
directory.
To preprocess the structure data into .npz
format, use the preprocess_data.py
script:
python preprocess_data.py --cpu 100 --summary_file ./data/sabdab_summary_all.tsv --data_dir ./data/mmcif --output_dir ./data/npz --data_mode mmcif
We recommend using the mmCIF
format for PDB structures, as it provides comprehensive information.
Pre-trained Models
- Download the AbX-DiffAb and AbX-RAbD model weights , and place them in the
./trained_model
directory. - Download the ESM2 model weights from here and the contact regressor weights from here, and save these files in the
./trained_model
directory.
Usage Instructions
Co-Design of CDRs in DiffAb Test Dataset
To perform co-design of CDRs using the DiffAb test dataset, use the following command:
CUDA_VISIBLE_DEVICES=0 python inference.py \
--model ./trained_model/abx_diffab.ckpt \
--model_features ./config/config_data_feature.json \
--model_config ./config/config_model.json \
--batch_size 1 \
--num_samples 100 \
--name_idx ./test_data/diffab_test.idx \
--data_dir ./data/npz \
--output_dir ./output/DiffAb_design \
--mode design
Co-Design of CDRs in RAbD Test Dataset
For co-design using the RAbD test dataset, execute the following:
CUDA_VISIBLE_DEVICES=0 python inference.py \
--model ./trained_model/abx_rabd.ckpt \
--model_features ./config/config_data_feature.json \
--model_config ./config/config_model.json \
--batch_size 1 \
--num_samples 100 \
--name_idx ./test_data/RAbD_test.idx \
--data_dir ./data/npz \
--output_dir ./output/RAbD_design \
--mode design
CDR Optimization in DiffAb Test Dataset
To optimize CDRs in the DiffAb test dataset, run the following command:
CUDA_VISIBLE_DEVICES=0 python inference.py \
--model ./trained_model/abx_diffab.ckpt \
--model_features ./config/config_data_feature.json \
--model_config ./config/config_model.json \
--batch_size 1 \
--num_samples 100 \
--name_idx ./test_data/diffab_test.idx \
--data_dir ./data/npz \
--output_dir ./output/DiffAb_optimize \
--mode optimize
Modify the generate_area
and optimize_steps
parameters to adjust the target regions and optimization steps.
Generating Trajectories
To generate a trajectory during the design of CDRs in the DiffAb test dataset, use the following:
CUDA_VISIBLE_DEVICES=0 python inference.py \
--model ./trained_model/abx_diffab.ckpt \
--model_features ./config/config_data_feature.json \
--model_config ./config/config_model.json \
--batch_size 1 \
--num_samples 100 \
--name_idx ./test_data/diffab_test.idx \
--data_dir ./data/npz \
--output_dir ./output/DiffAb_optimize \
--mode trajectory
Design CDRs given Antibody-Antigen Complex
To generate CDRs of given antibdody-antigen complexes in the PDB format, use the following:
CUDA_VISIBLE_DEVICES=0 python design.py \
--model ./trained_model/abx_diffab.ckpt \
--model_features ./config/config_data_feature.json \
--model_config ./config/config_model.json \
--batch_size 1 \
--num_samples 100 \
--pdb_file ./test_data/6ct7_H_L_S.pdb \
--output_dir ./output/design \
--mode design
The example of input antibody-antigen complexes is 6ct7_H_L_S.pdb
, where H
is the heavy chain id, L
is the light chain id and S
is the antigen chain id.
Relaxing the Designed Proteins
To relax the designed proteins using PyRosetta, run the following command and modify the relaxation regions using the generate_area
parameter:
CUDA_VISIBLE_DEVICES=0 python relax_pdb.py \
--data_dir ./output/output_dir \
--cpus 100 \
--generate_area cdrs
Metric Calculation
To compute the RMSD, AAR, and IMP metrics, use the eval_metric.py
script as follows:
CUDA_VISIBLE_DEVICES=0 python eval_metric.py \
--data_dir ./output/output_dir \
--cpus 100 \
--energy
For calculating plausibility, you may use AntiBERTy.
Reference
@inproceedings{
zhu2024antibody,
title={Antibody Design Using a Score-based Diffusion Model Guided by Evolutionary, Physical and Geometric Constraints},
author={Tian Zhu and Milong Ren and Haicang Zhang},
booktitle={Forty-first International Conference on Machine Learning},
year={2024},
url={https://openreview.net/forum?id=1YsQI04KaN}
}