

AbX: Antibody Design Using a Score-based Diffusion Model Guided by Evolutionary, Physical, and Geometric Constraints

AbX Logo

T. Zhu, M. Ren, H. Zhang. Antibody Design Using a Score-based Diffusion Model Guided by Evolutionary. ICML 2024. <br> Link to Paper at ICML 2024


If you encounter any issues with the installation or would like to report a bug, please feel free to open an issue on GitHub at https://github.com/CarbonMatrixLab/AbX/issues.

Setting up the AbX Environment

To install AbX, it is recommended to create a Conda environment and install the necessary dependencies by following these steps:

git clone git@github.com:CarbonMatrixLab/AbX.git 
conda env create -f environment.yml

PyRosetta is required to relax the generated structures and compute binding energy. Please refer to the installation guide provided here for further instructions.

Dataset Preparation

Antibody-antigen structures and associated summary files can be retrieved from the SAbDab database. The dataset and accompanying files can be downloaded from the following links:

Extract all_structures.zip into the data directory.

To preprocess the structure data into .npz format, use the preprocess_data.py script:

python preprocess_data.py --cpu 100 --summary_file ./data/sabdab_summary_all.tsv --data_dir ./data/mmcif --output_dir ./data/npz --data_mode mmcif

We recommend using the mmCIF format for PDB structures, as it provides comprehensive information.

Pre-trained Models

  1. Download the AbX-DiffAb and AbX-RAbD model weights , and place them in the ./trained_model directory.
  2. Download the ESM2 model weights from here and the contact regressor weights from here, and save these files in the ./trained_model directory.

Usage Instructions

Co-Design of CDRs in DiffAb Test Dataset

To perform co-design of CDRs using the DiffAb test dataset, use the following command:

CUDA_VISIBLE_DEVICES=0 python inference.py  \
    --model ./trained_model/abx_diffab.ckpt \
    --model_features ./config/config_data_feature.json \
    --model_config ./config/config_model.json \
    --batch_size 1 \
    --num_samples 100 \
    --name_idx ./test_data/diffab_test.idx \
    --data_dir  ./data/npz \
    --output_dir ./output/DiffAb_design \
    --mode design

Co-Design of CDRs in RAbD Test Dataset

For co-design using the RAbD test dataset, execute the following:

CUDA_VISIBLE_DEVICES=0 python inference.py  \
    --model ./trained_model/abx_rabd.ckpt \
    --model_features ./config/config_data_feature.json \
    --model_config ./config/config_model.json \
    --batch_size 1 \
    --num_samples 100 \
    --name_idx ./test_data/RAbD_test.idx \
    --data_dir  ./data/npz \
    --output_dir ./output/RAbD_design \
    --mode design

CDR Optimization in DiffAb Test Dataset

To optimize CDRs in the DiffAb test dataset, run the following command:

CUDA_VISIBLE_DEVICES=0 python inference.py  \
    --model ./trained_model/abx_diffab.ckpt \
    --model_features ./config/config_data_feature.json \
    --model_config ./config/config_model.json \
    --batch_size 1 \
    --num_samples 100 \
    --name_idx ./test_data/diffab_test.idx \
    --data_dir  ./data/npz \
    --output_dir ./output/DiffAb_optimize \
    --mode optimize

Modify the generate_area and optimize_steps parameters to adjust the target regions and optimization steps.

Generating Trajectories

To generate a trajectory during the design of CDRs in the DiffAb test dataset, use the following:

CUDA_VISIBLE_DEVICES=0 python inference.py  \
    --model ./trained_model/abx_diffab.ckpt \
    --model_features ./config/config_data_feature.json \
    --model_config ./config/config_model.json \
    --batch_size 1 \
    --num_samples 100 \
    --name_idx ./test_data/diffab_test.idx \
    --data_dir  ./data/npz \
    --output_dir ./output/DiffAb_optimize \
    --mode trajectory

Design CDRs given Antibody-Antigen Complex

To generate CDRs of given antibdody-antigen complexes in the PDB format, use the following:

CUDA_VISIBLE_DEVICES=0 python design.py  \
    --model ./trained_model/abx_diffab.ckpt \
    --model_features ./config/config_data_feature.json \
    --model_config ./config/config_model.json \
    --batch_size 1 \
    --num_samples 100 \
    --pdb_file  ./test_data/6ct7_H_L_S.pdb \
    --output_dir ./output/design \
    --mode design

The example of input antibody-antigen complexes is 6ct7_H_L_S.pdb, where H is the heavy chain id, L is the light chain id and S is the antigen chain id.

Relaxing the Designed Proteins

To relax the designed proteins using PyRosetta, run the following command and modify the relaxation regions using the generate_area parameter:

CUDA_VISIBLE_DEVICES=0 python relax_pdb.py  \
    --data_dir ./output/output_dir \
    --cpus 100 \
    --generate_area cdrs

Metric Calculation

To compute the RMSD, AAR, and IMP metrics, use the eval_metric.py script as follows:

CUDA_VISIBLE_DEVICES=0 python eval_metric.py  \
    --data_dir ./output/output_dir \
    --cpus 100 \

For calculating plausibility, you may use AntiBERTy.


title={Antibody Design Using a Score-based Diffusion Model Guided by Evolutionary, Physical and Geometric Constraints},
author={Tian Zhu and Milong Ren and Haicang Zhang},
booktitle={Forty-first International Conference on Machine Learning},