Home

Awesome

MolGPT

In this work, we train small custom GPT on Moses and Guacamol dataset with next token prediction task. The model is then used for unconditional and conditional molecular generation. We compare our model with previous approaches on the Moses and Guacamol datasets. Saliency maps are obtained for interpretability using Ecco library.

https://drive.google.com/drive/folders/1LrtGru7Srj_62WMR4Zcfs7xJ3GZr9N4E?usp=sharing

https://github.com/BenevolentAI/guacamol

https://github.com/molecularsets/moses

https://www.kaggle.com/virajbagal/ligflow-final-weights

To train the model, make sure you have the datasets' csv file in the same directory as the code files.

Training

./train_moses.sh
./train_guacamol.sh

Generation

./generate_guacamol_prop.sh
./generate_moses_prop_scaf.sh

If you find this work useful, please cite:

Bagal, Viraj; Aggarwal, Rishal; Vinod, P. K.; Priyakumar, U. Deva (2021): MolGPT: Molecular Generation using a Transformer-Decoder Model. ChemRxiv. Preprint. https://doi.org/10.26434/chemrxiv.14561901.v1