Home

Awesome

<h1 align="center"> βš—οΈ MolGen </h1> <h3 align="center"> Domain-Agnostic Molecular Generation with Chemical Feedback </h3> <p align="center"> πŸ“ƒ <a href="https://arxiv.org/abs/2301.11259" target="_blank">Paper</a> β€’ πŸ€— <a href="https://huggingface.co/zjunlp/MolGen-large" target="_blank">Model</a> β€’ πŸ”¬ <a href="https://huggingface.co/spaces/zjunlp/MolGen" target="_blank">Space</a> <br> </p>

Pytorch license

<div align=center><img src="molgen.png" width="100%" height="100%" /></div>

πŸ”” News

πŸ“• Requirements

To run the codes, You can configure dependencies by restoring our environment:

conda env create -f environment.yaml

and then:

conda activate my_env

πŸ“š Resource Download

You can download the pre-trained and fine-tuned models via Huggingface: MolGen-large and MolGen-large-opt.

You can also download the model using the following link: https://drive.google.com/drive/folders/1Eelk_RX1I26qLa9c4SZq6Tv-AAbDXgrW?usp=sharing

Moreover, the dataset used for downstream tasks can be found here.

The expected structure of files is:

moldata
β”œβ”€β”€ checkpoint 
β”‚Β Β  β”œβ”€β”€ molgen.pkl              # pre-trained model
β”‚   β”œβ”€β”€ syn_qed_model.pkl       # fine-tuned model for QED optimization on synthetic data
β”‚   β”œβ”€β”€ syn_plogp_model.pkl     # fine-tuned model for p-logP optimization on synthetic data
β”‚   β”œβ”€β”€ np_qed_model.pkl        # fine-tuned model for QED optimization on natural product data
β”‚   β”œβ”€β”€ np_plogp_model.pkl      # fine-tuned model for p-logP optimization on natural product data
β”œβ”€β”€ finetune
β”‚Β Β  β”œβ”€β”€ np_test.csv             # nature product test data
β”‚Β Β  β”œβ”€β”€ np_train.csv            # nature product train data
β”‚Β Β  β”œβ”€β”€ plogp_test.csv          # synthetic test data for plogp optimization
β”‚Β Β  β”œβ”€β”€ qed_test.csv            # synthetic test data for plogp optimization
β”‚Β Β  └── zinc250k.csv            # synthetic train data
β”œβ”€β”€ generate                    # generate molecules
β”œβ”€β”€ output                      # molecule candidates
└── vocab_list
    └── zinc.npy                # SELFIES alphabet

πŸš€ How to run

πŸ₯½ Experiments

We conduct experiments on well-known benchmarks to confirm MolGen's optimization capabilities, encompassing penalized logP, QED, and molecular docking properties. For detailed experimental settings and analysis, please refer to our paper.

<img width="950" alt="image" src="https://github.com/zjunlp/MolGen/assets/61076726/c32bf106-d43c-4d1d-af48-8caed03305bc">

Targeted molecule discovery

<img width="480" alt="image" src="https://github.com/zjunlp/MolGen/assets/61076726/51533e08-e465-44c8-9e78-858775b59b4f"> <img width="595" alt="image" src="https://github.com/zjunlp/MolGen/assets/61076726/6f17a630-88e4-46f6-9cb1-9c3637a264fc"> <img width="376" alt="image" src="https://github.com/zjunlp/MolGen/assets/61076726/4b934314-5f23-4046-a771-60cdfe9b572d">

Constrained molecular optimization

<img width="350" alt="image" src="https://github.com/zjunlp/MolGen/assets/61076726/bca038cc-637a-41fd-9b53-48ac67c4f182">

Citation

If you use or extend our work, please cite the paper as follows:

@inproceedings{fang2023domain,
  author       = {Yin Fang and
                  Ningyu Zhang and
                  Zhuo Chen and
                  Xiaohui Fan and
                  Huajun Chen},
  title        = {Domain-Agnostic Molecular Generation with Chemical feedback},
  booktitle    = {{ICLR}},
  publisher    = {OpenReview.net},
  year         = {2024},
  url          = {https://openreview.net/pdf?id=9rPyHyjfwP}
}

Star History Chart