Awesome
Molecule Breakdown Model
Thank you for your interest in this repo complementing the "Expanding Bioactive Fragment Space with the Generated Database GDB-13s" publication.
<img width="722" alt="image" src="https://github.com/Ye-Buehler/Molecule_Breakdown_Model/assets/93389795/6302158e-7e27-4eae-b657-47ea9bd87dcf">Requirements
You have installed anaconda or miniconda with python 3.6 - 3.9
Install conda environment
The requirements for the environment are given in the fragment.yml file
<Br/>conda env create -f fragment.yml
Specific packages used are also listed below:
- ipykernel >= 6.4.1
- numpy >= 1.21.2
- pandas >= 1.4.1
- pickleshare >= 0.7.5
- rdkit >= 2021.09.4
- tqdm >= 4.63.0
Conda environment activation
conda activate env-fragment
Quickstart
Example
example1.smi and example2.smi files in the folder example are provided for the Molecule Breakdown Model demonstration.
-
The model can be executed by using the following command to obtain the ring fragments and corresponding substituents:
python Molecule_Breakdown_Model_Ring_Fragments.py example1.smi
python Molecule_Breakdown_Model_Substituents.py example1.smi
-
The model can also be carried out by applying the script Molecule_Breakdown_Model_Ring_Fragments_Ring_Fragments.ipynb and Molecule_Breakdown_Model_Ring_Fragments_Substituents.ipynb in the Jupyter Notebook
Two .pickle files containing the ring fragments and substituents for each molecule in the sample pool (you can find these data in folder example/results_of_each_example) will be obtained. The fianl ring fragment and substituent datasets after combining the duplicates and sorting their frequency can be found in folder results_merging.
A .html file elaborating the final fragments results can be also obtained for each approach.
Results Merging
For very large databases, we usually have to split the databases or even use high-performance computer clusters to treat them parallelly.
Therefore, an efficient script for merging all the results for each sub-database is necessary.
In folder results_merging you can see the file Molecule_Breakdown_Results_Merge.ipynb which will realize this merging process.
Then all sorted frameworks or the top 10000 sorted frameworks for the entire database can be displayed in a .html file. You can also save the SMILES of frameworks on your own or utilize the results_merged.pickle file, and then visualize them with software like ChemDraw or Marvin.
Contributing
We welcome contributions, in the form of issues or pull requests.
If you have a question or want to report a bug, please submit an issue.
To contribute with code to the project, follow these steps:
- Fork this repository.
- Create a branch:
git checkout -b <branch_name>
. - Make your changes and commit them:
git commit -m '<commit_message>'
- Push to the remote branch:
git push
- Create the pull request.
Contributors
The contributors have limited time for support questions, but please do not hesitate to submit an issue (see above).
License
The software is licensed under the MIT license (see LICENSE file), and is free and provided as-is.