Awesome
Source code for the NeurIPS 2022 paper TANKBind: Trigonometry-Aware Neural NetworKs for Drug-Protein Binding Structure Prediction
TankBind
TankBind could predict both the protein-ligand binding structure and their affinity.
The primary purpose of this repository is to enable the reproduction of the results reported in the paper, as well as to facilitate the work of others who wish to build upon it. To experience the latest version, which includes various improvements made to the model, simply create an account at https://m1.galixir.com/public/login_en/index.html.
If you have any question or suggestion, please feel free to open an issue or email me at wei.lu@galixir.com or shuangjia zheng at shuangjia.zheng@galixir.com.
Installation
conda create -n tankbind_py38 python=3.8
conda activate tankbind_py38
You might want to change the cudatoolkit version based on the GPU you are using.:
conda install pytorch cudatoolkit=11.3 -c pytorch
conda install torchdrug=0.1.2 pyg=2.1.0 biopython nglview jupyterlab -c milagraph -c conda-forge -c pytorch -c pyg
pip install torchmetrics tqdm mlcrate pyarrow
rdkit version used: 2021.03.4
p2rank v2.3 could be downloaded from here:
https://github.com/rdk/p2rank/releases/download/2.3/p2rank_2.3.tar.gz
Test set evaluation
We include the script for reproducing the self-dock result in
examples/testset_evaluation_cleaned.ipynb
The test_dataset is constructed using the notebook in "Dataset construction" section.
Prediction
We use the prediction of the structure of protein ABL1 in complex with two drugs, Imatinib and compound6 (PDB: 6HD6) as an example for predicting the drug-protein binding structure.
examples/prediction_example_using_PDB_6hd6.ipynb
<img src="imgs/example_6hd6.png" width="200">
Dataset construction
Scripts for training/test dataset construction is provided in:
examples/construction_PDBbind_training_and_test_dataset.ipynb.ipynb
The Script I used to train the model is
python main.py -d 0 -m 0 --batch_size 5 --label baseline --addNoise 5 --use_equivalent_native_y_mask
High-throughput virtual screening
TankBind also support virtual screening. In our example here, for the WDR domain of LRRK2 protein, we can screen 10,000 drug candidates in 2 minutes (or 1M in around 3 hours) with a single GPU. Check out
examples/high_throughput_virtual_screening_LRRK2_WDR.ipynb
Citation
@article{lu2022tankbind,
title={Tankbind: Trigonometry-aware neural networks for drug-protein binding structure prediction},
author={Lu, Wei and Wu, Qifeng and Zhang, Jixian and Rao, Jiahua and Li, Chengtao and Zheng, Shuangjia},
journal={Advances in Neural Information Processing Systems},
year={2022}
}