Home

Awesome

<div align="center"><h1>OpenBioMed</h1></div> <h4 align="center"> <p> <b>English</b> | <a href="./README-CN.md">δΈ­ζ–‡</a> <p> </h4>

News πŸŽ‰

Table of contents

Introduction

This repository holds OpenBioMed, a Python deep learning toolkit for AI-empowered biomedicine. OpenBioMed provides easy access to multimodal biomedical data, i.e. molecular structures, transcriptomics, knowledge graphs and biomedical texts for molecules, proteins, and single cells. OpenBioMed supports a wide range of downstream applications, ranging from traditional AI drug discovery tasks to newly-emerged multimodal challenges.

OpenBioMed provide researchers with easy-to-use APIs to:

Key features of OpenBioMed include:

The following table shows the supported tasks, datasets and models in OpenBioMed. This is a continuing effort and we are working on further growing the list.

TaskSupported DatasetsSupported Models
Cross-modal RetrievalPCdesKV-PLM, SciBERT, MoMu, GraphMVP, MolFM
Molecule CaptioningChEBI-20MolT5, MoMu, GraphMVP, MolFM, BioMedGPT
Text-based Molecule GenerationChEBI-20MolT5, SciBERT, MoMu, MolFM
Molecule Question AnsweringChEMBL-QAMolT5, MolFM, BioMedGPT
Protein Question AnsweringUniProtQABioMedGPT
Cell Type ClassificationZheng68k, BaronscBERT, CellLM
Single Cell Drug Response PredictionGDSCDeepCDR, TGSA, CellLM
Molecule Property PredictionMoleculeNetMolCLR, GraphMVP, MolFM, DeepEIK, BioMedGPT
Drug-target Binding Affinity PredictionYamanishi08, BMKG-DTI, DAVIS, KIBADeepDTA, MGraphDTA, DeepEIK
Protein-protein Interaction PredictionSHS27k, SHS148k, STRINGPIPR, GNN-PPI, OntoProtein

Installation

  1. (Optional) Creating conda environment:
conda create -n OpenBioMed python=3.9
conda activate OpenBioMed
  1. Install required packages:
pip install -r requirements.txt
  1. Install Pyg dependencies:
pip install pyg_lib torch_scatter torch_sparse torch_cluster torch_spline_conv -f https://data.pyg.org/whl/torch-(your_torch_version)+(your_cuda_version).html
pip install torch-geometric
# If you have issues installing the above PyTorch-related packages, instructions at https://pytorch.org/get-started/locally/ and https://github.com/pyg-team/pytorch_geometric may help. You may find it convenient to directly install PyTorch Geometric and its extensions from wheels available at https://data.pyg.org/whl/.

Note: additional packages may be required for some downstream tasks.

Quick Start

Checkout our Jupytor notebooks and documentations for a quick start!

NameDescription
BioMedGPT-10B InferenceExample of using BioMedGPT-10B to answer questions about molecules and proteins.
Cross-modal Retrieval with MolFMExample of using MolFM to retrieve the most related text descriptions for a molecule.
Text-based Molecule Generation with MolT5Example of using MolT5 to generate the SMILES string of a molecule based on text description.
Cell Type classification with CellLMExample of using fine-tuned CellLM to classify cell types.
Molecule Property predictionTraining & testing pipeline of the molecule propery prediction task
Drug-response predictionTraining & testing pipeline of the drug-response prediction task
Drug-target binding affinity predictionTraining & testing pipeline of the drug-target binding affinity prediction task
Molecule captioningTraining & testing pipeline of the molecule captioning task

Limitations

This repository holds BioMedGPT-LM-7B and BioMedGPT-10B, and we emphasize the responsible and ethical use of these model. BioMedGPT should NOT be used to provide services to the general public. Generating any content that violates applicable laws and regulations, such as inciting subversion of state power, endangering national security and interests, propagating terrorism, extremism, ethnic hatred and discrimination, violence, pornography, or false and harmful information, etc. is strictly prohibited. BioMedGPT is not liable for any consequences arising from any content, data, or information provided or published by users.

License

This repository is licensed under the MIT License. The use of BioMedGPT-LM-7B and BioMedGPT-10B models is accompanied with Acceptable Use Policy.

Contact Us

We are looking forward to user feedback to help us improve our framework. If you have any technical questions or suggestions, please feel free to open an issue. For commercial support or collaboration, please contact opensource@pharmolix.com.

Cite Us

If you find our open-sourced code and models helpful to your research, please consider giving this repository a 🌟star and πŸ“Žciting the following articles. Thank you for your support!

To cite OpenBioMed:
@misc{OpenBioMed_code,
      author={Luo, Yizhen and Yang, Kai and Hong, Massimo and Liu, Xing Yi and Zhao, Suyuan and Zhang, Jiahuan and Wu, Yushuai and Nie, Zaiqing},
      title={Code of OpenBioMed},
      year={2023},
      howpublished={\url{https://github.com/BioFM/OpenBioMed.git}}
}
To cite BioMedGPT:
@misc{luo2023biomedgpt,
      title={BioMedGPT: Open Multimodal Generative Pre-trained Transformer for BioMedicine}, 
      author={Yizhen Luo and Jiahuan Zhang and Siqi Fan and Kai Yang and Yushuai Wu and Mu Qiao and Zaiqing Nie},
      year={2023},
      eprint={2308.09442},
      archivePrefix={arXiv},
      primaryClass={cs.CE}
}
To cite DeepEIK:
@misc{luo2023empowering,
      title={Empowering AI drug discovery with explicit and implicit knowledge}, 
      author={Yizhen Luo and Kui Huang and Massimo Hong and Kai Yang and Jiahuan Zhang and Yushuai Wu and Zaiqing Nie},
      year={2023},
      eprint={2305.01523},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}
To cite MolFM:
@misc{luo2023molfm,
      title={MolFM: A Multimodal Molecular Foundation Model}, 
      author={Yizhen Luo and Kai Yang and Massimo Hong and Xing Yi Liu and Zaiqing Nie},
      year={2023},
      eprint={2307.09484},
      archivePrefix={arXiv},
      primaryClass={q-bio.BM}
}
To cite CellLM:
@misc{zhao2023largescale,
      title={Large-Scale Cell Representation Learning via Divide-and-Conquer Contrastive Learning}, 
      author={Suyuan Zhao and Jiahuan Zhang and Zaiqing Nie},
      year={2023},
      eprint={2306.04371},
      archivePrefix={arXiv},
      primaryClass={cs.CE}
}
To cite LangCell:
@misc{zhao2024langcell,
      title={LangCell: Language-Cell Pre-training for Cell Identity Understanding}, 
      author={Suyuan Zhao and Jiahuan Zhang and Yizhen Luo and Yushuai Wu and Zaiqing Nie},
      year={2024},
      eprint={2405.06708},
      archivePrefix={arXiv},
      primaryClass={q-bio.GN}
}