Home

Awesome

Meta-Learning Initializations for Low Resource Drug Discovery

This repo contains accompanying code for the publication "Meta-Learning Initializations for Low Resource Drug Discovery" (Nguyen et al.).

Instructions

Cloning and setting up your environment

git clone https://github.com/GSK-AI/meta-learning-qsar.git
conda env create --name metalearning --file environment.yaml
source activate metalearning

Setting PYTHONPATH

cd meta-learning-qsar
export PYTHONPATH=$PYTHONPATH:$(pwd)

Setting OE_LICENSE

This step requires the OpenEye license file and is necessary for running src/featurize.py. Change <path> to the appropriate directory.

export OE_LICENSE=<path>/oe_license.txt

Running tests

Run all tests if OpenEye license is available

pytest

In the case where license file is not available, exclude tests that use OpenEye OEChem library

pytest -k "not openeye"

Usage

Reproducing experiments with ChEMBL20

Extracting and combining chunked and featurized data

python exp/preprocess.py

Train Baselines, MAML, FOMAML, and ANIL using the provided splits

./exp/train_and_evaluate.sh 

Once training is done, generate test statistics on held-out test tasks by running

./exp/test.sh 

Training on custom data

First featurize data from SMILES to graph representation.

python src/featurize.py \
    --data <csv file> \
    --smiles_col <name of SMILES column> \
    --output_col <name of output columns> \
    --output_path <folder to store featurized data>

Use src/train_maml.py to kick off MAML training. The two required arguments are --save_path and --source.

python src/train_maml.py \ 
    --save_path <directory to store checkpoint> \
    --source <directory where training and validation data is stored>
    ...

Use src/validate_maml.py to calculate validation metrics from saved checkpoints. This python script will kick off validation slurm jobs as new checkpoints are found. --monitor_path and --source should be the the same as --save_path and --source used in src/train_maml.py

python src/validate_maml.py  \
    --monitor_path <directory to store checkpoint> \
    --source <directory where training and validation data is stored> 
    ...

Notes

Contact

For questions, please feel free to reach out via email at cuong.q.nguyen@gsk.com.