Awesome
Learning Graph Models for Retrosynthesis Prediction
(Under Construction and Subject to Change)
This is the official PyTorch implementation for GraphRetro (Somnath et al. 2021), a graph based model for one-step retrosynthesis prediction. Our model achieves the transformation from products to reactants using a two stage decomposition:
a) Edit Prediction: Identifies edits given a product molecule, which upon application give intermediate molecules called synthons
b) Synthon Completion: Completes synthons into reactants by adding subgraphs called leaving groups from a precomputed vocabulary.
Setup
This assumes conda is installed on your system
If conda is not installed, download the Miniconda installer.
If conda is installed, run the following commands:
echo 'export SEQ_GRAPH_RETRO=/path/to/dir/' >> ~/.bashrc
source ~/.bashrc
conda env create -f environment.yml
source activate seq_gr
python setup.py develop(or install)
Datasets
The original and canonicalized files are provided under datasets/uspto-50k/
. Please make sure to move them to $SEQ_GRAPH_RETRO/
before use.
Input Preparation
Before preparing inputs, we canonicalize the products. This can be done by running,
python data_process/canonicalize_prod.py --filename train.csv
python data_process/canonicalize_prod.py --filename eval.csv
python data_process/canonicalize_prod.py --filename test.csv
This step can also be skipped if the canonicalized files are already present. The preprocessing steps now directly work with the canonicalized files.
1. Reaction Info preparation
python data_process/parse_info.py --mode train
python data_process/parse_info.py --mode eval
python data_process/parse_info.py --mode test
2. Prepare batches for Edit Prediction
python data_process/core_edits/bond_edits.py
3. Prepare batches for Synthon Completion
python data_process/lg_edits/lg_classifier.py
python data_process/lg_edits/lg_tensors.py
Run a Model
Trained models are stored in experiments/
. You can override this by adjusting --exp_dir
before training.
Model configurations are stored in config/MODEL_NAME
where MODEL_NAME
is one of {single_edit, lg_ind}
.
To run a model,
python scripts/benchmarks/run_model.py --config_file configs/MODEL_NAME/defaults.yaml
NOTE: We recently updated the code to use wandb for experiment tracking. You would need to setup wandb before being able to train a model.
Evaluate using a Trained Model
To evaluate the trained model, run
python scripts/eval/single_edit_lg.py --edits_exp EDITS_EXP --edits_step EDITS_STEP \
--lg_exp LG_EXP --lg_step LG_STEP
This will setup a model with the edit prediction module loaded from experiment EDITS_EXP
and checkpoint EDITS_STEP
and the synthon completion module loaded from experiment LG_EXP
and checkpoint LG_STEP
.
Reproducing our results
To reproduce our results, please run the command,
./eval.sh
This will display the results for reaction class unknown and known setting.
License
This project is licensed under the MIT-License. Please see LICENSE.md for more details.
Reference
If you find our code useful for your work, please cite our paper:
@inproceedings{
somnath2021learning,
title={Learning Graph Models for Retrosynthesis Prediction},
author={Vignesh Ram Somnath and Charlotte Bunne and Connor W. Coley and Andreas Krause and Regina Barzilay},
booktitle={Thirty-Fifth Conference on Neural Information Processing Systems},
year={2021},
url={https://openreview.net/forum?id=SnONpXZ_uQ_}
}
Contact
If you have any questions about the code, or want to report a bug, please raise a GitHub issue.