Awesome
<div align="center">MISATO - Machine learning dataset of protein-ligand complexes for structure-based drug discovery
</div>:earth_americas: Where we are:
- Quantum Mechanics: 19443 ligands, curated and refined
- Molecular Dynamics: 16972 simulated protein-ligand structures, 10 ns each
- AI: pytorch dataloaders, 3 base line models for MD and QM and binding affinity prediction
:electron: Vision:
We are a drug discovery community project :hugs:
- highest possible accuracy for ligand molecules
- represent the systems dynamics in reasonable timescales
- innovative AI models for drug discovery predictions
Lets crack the 100+ ns MD, 30000+ protein-ligand structures and a whole new world of AI models for drug discovery together.
:purple_heart: Community
Want to get hands-on for drug discovery using AI?
Check out our Hugging Face spaces to run and visualize the adaptability model and to perform QM property predictions.
πΒ Β Introduction
In this repository, we show how to download and apply the Misato database for AI models. You can access the calculated properties of different protein-ligand structures and use them for training in Pytorch based dataloaders. We provide a small sample of the dataset along with the repo.
You can freely download the FULL MISATO dataset from Zenodo using the links below:
- MD (133 GiB)
- QM (0.3 GiB)
- electronic densities (6 GiB)
- MD restart and topology files (55 GiB)
wget -O data/MD/h5_files/MD.hdf5 https://zenodo.org/record/7711953/files/MD.hdf5
wget -O data/QM/h5_files/QM.hdf5 https://zenodo.org/record/7711953/files/QM.hdf5
Start with the notebook src/getting_started.ipynb to :
- Understand the structure of our dataset and how to access each molecule's properties.
- Load the PyTorch Dataloaders of each dataset.
- Load the PyTorch lightning Datamodules of each dataset.
πΒ Β Quickstart
We recommend to pull our MISATO image from DockerHub or to create your own image (see docker/). The images use cuda version 11.8. We recommend to install on your own system a version of CUDA that is a least 11.8 to ensure that the drivers work correctly.
# clone project
git clone https://github.com/t7morgen/misato-dataset.git
cd misato-dataset
For singularity use:
# get the container image
singularity pull docker://sab148/misato-dataset
singularity shell misato.sif
For docker use:
sudo docker pull sab148/misato-dataset:latest
bash docker/run_bash_in_container.sh
<br>
Project Structure
βββ data <- Project data
β βββMD
β β βββ h5_files <- storage of dataset
β β βββ splits <- train, val, test splits
β βββQM
β β βββ h5_files <- storage of dataset
β β βββ splits <- train, val, test splits
β
βββ src <- Source code
β βββ data
β β βββ components <- Datasets and transforms
β β βββ md_datamodule.py <- MD Lightning data module
β β βββ qm_datamodule.py <- QM Lightning data module
β β β
β β βββ processing <- Skripts for preprocessing, inference and conversion
β β βββ...
β βββ getting_started.ipynb <- notebook : how to load data and interact with it
β βββ inference.ipynb <- notebook how to run inference
β
βββ docker <- Dockerfile and execution script
βββ README.md
<br>
<br>
<br>
Installation using your own conda environment
In case you want to use conda for your own installation please create a new misato environment.
In order to install pytorch geometric we recommend to use pip (within conda) for installation and to follow the official installation instructions:pytorch-geometric/install
Depending on your CUDA version the instructions vary. We show an example for the CUDA 11.8.
conda create --name misato python=3
conda activate misato
conda install -c anaconda pandas pip h5py
pip3 install torch --index-url https://download.pytorch.org/whl/cu118 --no-cache
pip install joblib matplotlib
pip install pyg_lib torch_scatter torch_sparse torch_cluster torch_spline_conv -f https://data.pyg.org/whl/torch-2.0.0+cu118.html
pip install pytorch-lightning==1.8.3
pip install torch-geometric
pip install ipykernel==5.5.5 ipywidgets==7.6.3 nglview==2.7.7
conda install -c conda-forge nb_conda_kernels
To run inference for MD you have to install ambertools. We recommend to install it in a separate conda environment.
conda create --name ambertools python=3
conda activate ambertools
conda install -c conda-forge ambertools nb_conda_kernels
pip install h5py jupyter ipykernel==5.5.5 ipywidgets==7.6.3 nglview==2.7.7
Citation
If you found this work useful please consider citing the article.
@article{siebenmorgen2024misato,
title={MISATO: machine learning dataset of protein--ligand complexes for structure-based drug discovery},
author={Siebenmorgen, Till and Menezes, Filipe and Benassou, Sabrina and Merdivan, Erinc and Didi, Kieran and Mour{\~a}o, Andr{\'e} Santos Dias and Kitel, Rados{\l}aw and Li{\`o}, Pietro and Kesselheim, Stefan and Piraud, Marie and Theis, Fabian J. and Sattler, Michael and Popowicz, Grzegorz M.},
journal={Nature Computational Science},
pages={1--12},
year={2024},
publisher={Nature Publishing Group US New York}
}