Awesome

MISATO Affinity Predictions

</div>

:purple_heart: Community

Want to get hands-on for drug discovery using AI?

:rocket: About

In this repository we provide the code for the binding affinity prediction task described in our paper. For the main dataset and instructions how to download it visit the main repository site

:computer: Environment setup

1. Conda environment

Create a conda environment for the project via

make venv # will create a cpu environment
# NOTE: This will simply call
#  conda env create --prefix=./venv -f requirements/env.yml

# For a gpu environment call
#  make name=venv_gpu sys=gpu venv
#  conda activate ./venv_gpu

# For a Mac m1 environment call
#  make name=venv sys=m1 venv
#  conda activate ./venv

# To activate the environment, use:
conda activate ./venv

After this, install the local dependencies via pip install -e ., executing this command from the project directory.

2. Environment variables

Set environment variables for you system in a .env file at the project directory (same as this README.) Specify the following variables:

# Path to the general data directory
DATA_PATH=data/

# Path to the directory where run outputs will be stored
RUNS_PATH=<path_to_your_runs_directory>

# Path to the root of the project directory
PROJECT_PATH=<path_to_your_project_directory>

# Path to the .hdf5 file containing Molecular Dynamics (MD) data
MD_PATH=<path_to_your_md_data_file>

# Path to the .hdf5 file containing (QM) data
QM_PATH=<path_to_your_qm_data_file>

# Path to the .hdf5 file containing the h5 of the preprocessed invariant graphs
INVARIANT_GRAPH_PATH=<path_to_your_graph_data_file>

# Path to the .h5 file containing affinity data
AFFINITY_PATH=data/affinity_data.h5

# Path to the .pickle file containing all possible available protein-ligand pairs
PAIR_PATH_TRAIN=data/train_pairs.pickle
PAIR_PATH_TEST=data/test_pairs.pickle
PAIR_PATH_VAL=data/val_pairs.pickle

# Your Weights & Biases API key
WANB_API_KEY="<your_wandb_api_key>"

# Your Weights & Biases entity (username or team name)
WANDB_ENTITY="<your_wandb_entity>"

# Your Weights & Biases project name
WANDB_PROJECT="<your_wandb_project_name>"

3. Config variables

This project uses Hydra for configuration management. Adjust the parameters in the configs directory to your setup to run this project or adjust them via the command line (see below).

:file_folder: MISATO files and Preprocessing

The MISATO h5 files can be downloaded like this:

wget -O data/MD.hdf5 https://zenodo.org/record/7711953/files/MD.hdf5
wget -O data/QM.hdf5 https://zenodo.org/record/7711953/files/QM.hdf5

You can download a preprocessed h5 file containing the MD adaptability and reference coordinates from here: https://syncandshare.lrz.de/getlink/fiCk9juiXYBKZ73VyHt372/adaptability_MD.hdf5

The preprocessed graphs for the dataloader can also be downloaded.

Invariant graph: https://syncandshare.lrz.de/getlink/fi41DBaf6f1b6ZHSiGokU9/preprocessed_graph_invariant_numlig_affType.h5

Alternatively, generate a h5 file containing the adaptability values from the MD.hdf5 file by running the preprocessing. To this end follow the instructions from the MISATO repository https://github.com/t7morgen/misato-dataset .

The preprocessing scripts for the graphs can be found in src/data/processing/.

:chart_with_upwards_trend: Experiment logging with wandb

To log to wandb, you will first need to log in. To do so, simply install wandb via pip with pip install wandb and call wandb login from the commandline.

If you are already logged in and need to relogin for some reason, use wandb login --relogin.

:mechanical_arm: Training a model with pytorch lightning and logging on wandb

To run a model simply use

python src/train.py name=<YOUR_RUN_NAME>

By default, train.py uses Weights&Biases logging via the credentials you provided in your .env file. If you do not pass a name for the run, the default name test will be used and logging will be disabled. If you give your run a name different than test, WandB logging will be enabled and the run will be logged to your WandB account with the name you gave it.

For the invariant case, you could run a training run like:

python src/train.py name=<YOUR_RUN_NAME> model=gcn datamodule=md_datamodule_invariant

To use parameters that are different from the default parameters in src/configs/config.yaml you can simply provide them in the command line call. For example:

python src/train.py name=<YOUR_RUN_NAME> trainer.epochs=100

By default, we run a test run after training. If you want to disable that, you need to pass the config model_test=False when starting the training.

To configure extra things such as logging, use

# LOGGING
# For running at DEBUG logging level:
#  (c.f. https://hydra.cc/docs/tutorials/basic/running_your_app/logging/ )
## Activating debug log level only for loggers named `__main__` and `hydra`
python src/train.py 'hydra.verbose=[__main__, hydra]'
## Activating debug log level for all loggers
python src/train.py hydra.verbose=true

# PRINTING CONFIG ONLY
## Print only the job config, then return without running
python src/train.py --cfg job

# GET HELP
python src/train.py --help