Awesome
FusionRetro: Molecule Representation Fusion via In-Context Learning for Retrosynthetic Planning
This repository contains an implementation of "FusionRetro: Molecule Representation Fusion via In-Context Learning for Retrosynthetic Planning", which is an autoregressive framework for molecule synthetic route generation.
Warning
We only use one route for each molecule in the training dataset for model training!!!
Contribution
Zuobai Zhang contributes the implementation of G2Gs, while I contribute the rest.
Dropbox
We provide the starting material file in dropbox, you can download this file via: https://www.dropbox.com/scl/fi/j3kh641irxtpbrnjnmoop/zinc_stock_17_04_20.hdf5?rlkey=zqbymj13skpdqlswu2uvji1sq&st=c1805gz0&dl=0 Please move this file into the root folder.
FusionRetro
cp train_dataset.json valid_dataset.json test_dataset.json zinc_stock_17_04_20.hdf5 FusionRetro/
cd FusionRetro
#Data Process
python to_canolize.py --dataset train
python to_canolize.py --dataset valid
python to_canolize.py --dataset test
#Initial Train
python train.py --batch_size 64 --epochs 3000
# After 3000 epochs, We set global_step to 1000000 and continue to train the model (3000th epoch's model paramater) with 1000 epochs
#Continue Train
python train.py --batch_size 64 --continue_train --epochs 1000
# We select the model with the performance on the first 100 routes in the validation dataset
#We also provide model.pkl, you can skip the above commands
#Retro Star Zero Search
python retro_star_0.py --beam_size 5
#Retro Star Search
python get_reaction_cost.py
python get_molecule_cost.py
python value_mlp.py
#We also provide value_mlp.pkl, you can skip the above commands
python retro_star.py --beam_size 5
#Greedy DFS Search
python greedy_dfs.py --beam_size 5
Transformer
cp train_dataset.json valid_dataset.json test_dataset.json zinc_stock_17_04_20.hdf5 Transformer/
cd Transformer
#Data Process
python to_canolize.py --dataset train
python to_canolize.py --dataset valid
python to_canolize.py --dataset test
#Train
python train.py --batch_size 32 --epochs 2000
# We select the model with the performance on the first 100 routes in the validation dataset
#We also provide model.pkl, you can skip the above commands
#Retrosynthesis Test
python retrosynthesis_test.py --beam_size 10
#Retro Star Zero Search
python retro_star_0.py --beam_size 5
#Retro Star Search
python get_reaction_cost.py
python get_molecule_cost.py
python value_mlp.py
#We also provide value_mlp.pkl, you can skip the above commands
python retro_star.py --beam_size 5
#Greedy DFS Search
python greedy_dfs.py --beam_size 5
Retrosim
cp train_dataset.json valid_dataset.json test_dataset.json zinc_stock_17_04_20.hdf5 Retrosim/
cd Retrosim
#Retrosynthesis Test
python retrosynthesis_test.py --beam_size 10 --num_cores 64
#Retro Star Zero Search
python retro_star_0.py --beam_size 5 --num_cores 64
#Retro Star Search
python get_reaction_cost.py
python get_molecule_cost.py
python value_mlp.py
#We also provide value_mlp.pkl, you can skip the above commands
python retro_star.py --beam_size 5 --num_cores 64
#Greedy DFS Search
python greedy_dfs.py --beam_size 5 --num_cores 64
Neuralsym
cp train_dataset.json valid_dataset.json test_dataset.json zinc_stock_17_04_20.hdf5 Neuralsym/
cd Neuralsym
#Data Process
python prepare_data.py
#Train
bash train.sh
# We select the model by the original' code's setting
#Retrosynthesis Test
python retrosynthesis_test.py --beam_size 10 --num_cores 64
#Retro Star Zero Search
python retro_star_0.py --beam_size 5 --num_cores 64
#Retro Star Search
python get_reaction_cost.py
python get_molecule_cost.py
python value_mlp.py
#We also provide value_mlp.pkl, you can skip the above commands
python retro_star.py --beam_size 5 --num_cores 64
#Greedy DFS Search
python greedy_dfs.py --beam_size 5 --num_cores 64
GLN
cp train_dataset.json valid_dataset.json test_dataset.json zinc_stock_17_04_20.hdf5 GLN/gln/
cd GLN
pip install -e .
cd gln
#Data Process
python process_data_stage_1.py -save_dir data
python process_data_stage_2.py -save_dir data -num_cores 12 -num_parts 1 -fp_degree 2 -f_atoms data/atom_list.txt -retro_during_train False $@
python process_data_stage_2.py -save_dir data -num_cores 12 -num_parts 1 -fp_degree 2 -f_atoms data/atom_list.txt -retro_during_train True $@
#Train
bash run_mf.sh schneider
# We select the model with the performance on all routes in the validation dataset
#Retrosynthesis Test
python retrosynthesis_test.py -save_dir data -f_atoms data/atom_list.txt -gpu 0 -seed 42 -beam_size 10 -epoch_for_test 100
#Retro Star Zero Search
python retro_star_0.py -save_dir data -f_atoms data/atom_list.txt -gpu 0 -seed 42 -beam_size 5 -epoch_for_search 100
#Retro Star Search
python get_reaction_cost.py -save_dir data -f_atoms data/atom_list.txt -gpu 0 -seed 42 -beam_size 10 -epoch_for_search 100
python get_molecule_cost.py
python value_mlp.py
#We also provide value_mlp.pkl, you can skip the above commands
python retro_star.py -save_dir data -f_atoms data/atom_list.txt -gpu 0 -seed 42 -beam_size 5 -epoch_for_search 100
#Greedy DFS Search
python greedy_dfs.py -save_dir data -f_atoms data/atom_list.txt -gpu 0 -seed 42 -beam_size 5 -epoch_for_search 100
Megan
cp train_dataset.json valid_dataset.json test_dataset.json zinc_stock_17_04_20.hdf5 Megan/data/
mv Megan/data/valid_dataset.json Megan/data/val_dataset.json
cd Megan
source env.sh
#Data Process
python json2csv.py
python acquire.py uspto_50k
python featurize.py uspto_50k megan_16_bfs_randat
#Train
python bin/train.py uspto_50k models/uspto_50k
# We select the model by the original' code's setting
#Retrosynthesis Test
python bin/retrosynthesis_test.py models/uspto_50k --beam-size 10
#Retro Star Search
python bin/get_reaction_cost.py models/uspto_50k --beam-size 10
python bin/get_molecule_cost.py
python bin/value_mlp.py
#We also provide value_mlp.pkl, you can skip the above commands
python bin/retro_star.py models/uspto_50k --beam-size 5
#Retro Star Zero Search
python bin/retro_star_0.py models/uspto_50k --beam-size 5
#Greedy DFS Search
python bin/greedy_dfs.py models/uspto_50k --beam-size 5
GraphRetro
cp train_dataset.json valid_dataset.json test_dataset.json zinc_stock_17_04_20.hdf5 GraphRetro/datasets/uspto-50k
cd GraphRetro
export SEQ_GRAPH_RETRO=$(pwd)
python setup.py develop
#Data Process
mv datasets/uspto-50k/valid_dataset.json datasets/uspto-50k/eval_dataset.json
python json2csv.py
python data_process/canonicalize_prod.py --filename train.csv
python data_process/canonicalize_prod.py --filename eval.csv
python data_process/canonicalize_prod.py --filename test.csv
python data_process/parse_info.py --mode train
python data_process/parse_info.py --mode eval
python data_process/parse_info.py --mode test
python data_process/core_edits/bond_edits.py
python data_process/lg_edits/lg_classifier.py
python data_process/lg_edits/lg_tensors.py
#Train
python scripts/benchmarks/run_model.py --config_file configs/single_edit/defaults.yaml
python scripts/benchmarks/run_model.py --config_file configs/lg_ind/defaults.yaml
# We select the model by the original' code's setting
#We also provide model files, you can skip the above commands
#Retrosynthesis Test
python scripts/eval/retrosynthesis_test.py --beam_size 10 --edits_exp SingleEdit_20220823_044246 --lg_exp LGIndEmbed_20220823_04432 --edits_step best_model --lg_step best_model --exp_dir models
#Retro Star Search
python scripts/eval/get_reaction_cost.py --beam_size 10 --edits_exp SingleEdit_20220823_044246 --lg_exp LGIndEmbed_20220823_04432 --edits_step best_model --lg_step best_model --exp_dir models
python scripts/eval/get_molecule_cost.py
python scripts/eval/value_mlp.py
#We also provide value_mlp.pkl, you can skip the above commands
python scripts/eval/retro_star.py --beam_size 5 --edits_exp SingleEdit_20220823_044246 --lg_exp LGIndEmbed_20220823_04432 --edits_step best_model --lg_step best_model --exp_dir models
#Retro Star Zero Search
python scripts/eval/retro_star_0.py --beam_size 5 --edits_exp SingleEdit_20220823_044246 --lg_exp LGIndEmbed_20220823_04432 --edits_step best_model --lg_step best_model --exp_dir models
#Retrosynthesis Test
python scripts/eval/greedy_dfs.py --beam_size 5 --edits_exp SingleEdit_20220823_044246 --lg_exp LGIndEmbed_20220823_04432 --edits_step best_model --lg_step best_model --exp_dir models
G2Gs
cp train_dataset.json valid_dataset.json test_dataset.json zinc_stock_17_04_20.hdf5 G2Gs/datasets/
cd G2Gs
#Train
python script/train.py -g [0]
# We select the model by the original' code's setting
#Retrosynthesis Test
python script/retrosynthesis_test.py -g [0] -k 10 -b 1
#Retro Star Search
python script/get_reaction_cost.py -g [0] -k 10 -b 1
python get_molecule_cost.py
python value_mlp.py
#We also provide value_mlp.pkl, you can skip the above commands
python script/retro_star.py -g [0] -k 5 -b 1
#Retro Star Zero Search
python script/retro_star_0.py -g [0] -k 5 -b 1
#Greedy DFS Search
python script/greedy_dfs.py -g [0] -k 5 -b 1
Acknowledgement
My deepest thanks to Binghong Chen and Samuel Genheden for very helpful discussions on their benchmarks (Retro* and PaRoutes)!
Reference
Retrosim: https://github.com/connorcoley/retrosim
Neuralsym: https://github.com/linminhtoo/neuralsym
GLN: https://github.com/Hanjun-Dai/GLN
G2Gs: https://torchdrug.ai/docs/tutorials/retrosynthesis
GraphRetro: https://github.com/vsomnath/graphretro
Transformer: https://github.com/bigchem/synthesis
Megan: https://github.com/molecule-one/megan
Citation
@inproceedings{liu2023fusionretro,
title={FusionRetro: Molecule Representation Fusion via In-Context Learning for Retrosynthetic Planning},
author={Liu, Songtao and Tu, Zhengkai and Xu, Minkai and Zhang, Zuobai and Lin, Lu and Ying, Rex and Tang, Jian and Zhao, Peilin and Wu, Dinghao},
booktitle={International Conference on Machine Learning},
year={2023}
}