Awesome

Awesome-Foundation-Models-for-Advancing-Healthcare

[NEWS.20241115] Our survey paper has been accepted by IEEE Reviews in Biomedical Engineering (IF: 17.2).

[NEWS.20240405] The related survey paper has been released.

[NOTE] If you have any questions, please don't hesitate to contact us.

Foundation model, which is pre-trained on broad data and is able to adapt to a wide range of tasks, is advancing healthcare. It promotes the development of healthcare artificial intelligence (AI) models, breaking the contradiction between limited AI models and diverse healthcare practices. Much more widespread healthcare scenarios will benefit from the development of a healthcare foundation model (HFM), improving their advanced intelligent healthcare services.

This repository is a collection of AWESOME things about Foundation models in healthcare, including language foundation models (LFMs), vision foundation models (VFMs), bioinformatics foundation models (BFMs), and multimodal foundation models (MFMs). Feel free to star and fork.

This repository provides the advancement of current healthcare foundation models based on the following paper:

Foundation Model for Advancing Healthcare: Challenges, Opportunities and Future Directions 中译版 Yuting He, Fuxiang Huang, Xinrui Jiang, Yuxiang Nie, Minghao Wang, Jiguang Wang, Hao Chen SMART Lab, Hong Kong University of Science and Technology IEEE Reviews in Biomedical Engineering

If you find our survey beneficial to your work, we would greatly appreciate it if you cite it in your paper:

@ARTICLE{10750441,
  author={He, Yuting and Huang, Fuxiang and Jiang, Xinrui and Nie, Yuxiang and Wang, Minghao and Wang, Jiguang and Chen, Hao},
  journal={IEEE Reviews in Biomedical Engineering}, 
  title={Foundation Model for Advancing Healthcare: Challenges, Opportunities and Future Directions}, 
  year={2024},
  volume={},
  number={},
  pages={1-20},
  doi={10.1109/RBME.2024.3496744}}

Awesome-Foundation-Models-for-Advancing-Healthcare
Related survery
Methods
Datasets
Other Resources

Related survey

2024

[arXiv] Foundation models for biomedical image segmentation: A survey. [Paper]
[arXiv] Progress and opportunities of foundation models in bioinformatics. [Paper]
[arXiv] Large language models in bioinformatics: applications and perspectives. [Paper]
[arXiv] Data-centric foundation models in computational healthcare: A survey. [Paper]
[arXiv] Vision-Language Models for Medical Report Generation and Visual Question Answering: A Review. [Paper]

2023

[ACM Computing Surveys] Pre-trained language models in biomedical domain: A systematic survey. [Paper]
[Nature medicine] Large language models in medicine. [Paper]
[arXiv] A survey of large language models in medicine: Progress, application, and challenge. [Paper]
[arXiv] A survey of large language models for healthcare: from data, technology, and applications to accountability and ethics. [Paper]
[arXiv] Large language models illuminate a progressive pathway to artificial healthcare assistant: A review. [Paper]
[arXiv] Foundational models in medical imaging: A comprehensive survey and future vision. [Paper]
[arXiv] CLIP in medical imaging: A comprehensive survey. [Paper]
[arXiv] Medical vision language pretraining: A survey. [Paper]
[MIR] Pre-training in medical data: A survey. [Paper]
[J-BHI] Large AI models in health informatics: Applications, challenges, and the future. [Paper]
[MedComm–Future Medicine] Accelerating the integration of ChatGPT and other large-scale AI models into biomedical research and healthcare. [Paper]
[Nature] Foundation models for generalist medical artificial intelligence. [Paper]
[MedIA] On the challenges and perspectives of foundation models for medical image analysis. [Paper]

Methods

LFM methods

2024

[AAAI] Zhongjing: Enhancing the chinese medical capabilities of large language model through expert feedback and realworld multi-turn dialogue. [Paper] [Code]
[NeurIPS] MDAgents: An adaptive collaboration of LLMs for medical decision-making. [Paper] [Code]
[arXiv] Me LLaMA: Foundation large language models for medical applications [Paper] [Code]
[arXiv] BioMistral: A collection of open-source pretrained large language models for medical domains [Paper] [Code]
[arXiv] BiMediX: Bilingual medical mixture of experts LLM [Paper] [Code]
[arXiv] OncoGPT: A medical conversational model tailored with oncology domain expertise on a large language model Meta-AI (LLaMA) [Paper] [Code]
[arXiv] JMLR: Joint medical LLM and retrieval training for enhancing reasoning and professional question answering capability [Paper]

2023

[Bioinformatics] MedCPT: A method for zero-shot biomedical information retrieval using contrastive learning with PubMedBERT. [Paper] [Code]
[arXiv] Pmc-llama: Towards building open-source language models for medicine. [Paper] [Code]
[arXiv] Meditron-70b: Scaling medical pretraining for large language models. [Paper] [Code]
[arXiv] Qilin-med: Multi-stage knowledge injection advanced medical large language model. [Paper] [Code]
[arXiv] Huatuogpt-ii, one-stage training for medical adaption of llms. [Paper] [Code]
[NPJ Digit. Med.] A study of generative large language model for medical research and healthcare. [Paper] [Code]
[arXiv] From beginner to expert: Modeling medical knowledge into general llms. [Paper]
[arXiv] Huatuo: Tuning llama model with chinese medical knowledge. [Paper] [Code]
[arXiv] Chatdoctor: A medical chat model fine-tuned on a large language model meta-ai (llama) using medical domain knowledge. [Paper] [Code]
[arXiv] Medalpaca–an open-source collection of medical conversational ai models and training data. [Paper] [Code]
[arXiv] Alpacare: Instruction-tuned large language models for medical application. [Paper] [Code]
[arXiv] Huatuogpt, towards taming language model to be a doctor. [Paper] [Code]
[arXiv] Doctorglm: Fine-tuning your chinese doctor is not a herculean task. [Paper] [Code]
[arXiv] Bianque: Balancing the questioning and suggestion ability of health llms with multi-turn health conversations polished by chatgpt. [Paper] [Code]
[arXiv] Taiyi: A bilingual fine-tuned large language model for diverse biomedical tasks. [Paper] [Code]
[Github] Visual med-alpaca: A parameter-efficient biomedical llm with visual capabilities. [Code]
[arXiv] Ophglm: Training an ophthalmology large languageand-vision assistant based on instructions and dialogue. [Paper] [Code]
[arXiv] Chatcad: Interactive computer-aided diagnosis on medical image using large language models. [Paper] [Code]
[arXiv] Chatcad+: Towards a universal and reliable interactive cad using llms. [Paper] [Code]
[arXiv] Deid-gpt: Zero-shot medical text de-identification by gpt-4. [Paper] [Code]
[arXiv] Can generalist foundation models outcompete special-purpose tuning? case study in medicine. [Paper] [Code]
[arXiv] Medagents: Large language models as collaborators for zero-shot medical reasoning. [Paper] [Code]
[AIME] Soft-prompt tuning to predict lung cancer using primary care free-text dutch medical notes. [Paper] [Code]
[arXiv] Clinical decision transformer: Intended treatment recommendation through goal prompting. [Paper] [Code]
[Nature] Large language models encode clinical knowledge [Paper]
[arXiv] Towards expert-level medical question answering with large language models [Paper]
[arXiv] Gpt-doctor: Customizing large language models for medical consultation [Paper]
[arXiv] Clinicalgpt: Large language models finetuned with diverse medical data and comprehensive evaluation [Paper]
[arXiv] Leveraging a medical knowledge graph into large language models for diagnosis prediction [Paper]

2022

[NPJ Digit. Med.] A large language model for electronic health records. [Paper] [Code]
[AMIA Annu. Symp. Proc.] Healthprompt: A zero-shot learning paradigm for clinical natural language processing. [Paper]
[BioNLP] Position-based prompting for health outcome generation [Paper]

2021

[ACM Trans. Comput. Healthc.] Domain-specific language model pretraining for biomedical natural language processing. [Paper] [Code]

2020

[JMIR Med. Info.] Modified bidirectional encoder representations from transformers extractive summarization model for hospital information systems based on character-level tokens (alphabert): development and performance evaluation. [Paper] [Code]
[Scientific reports] Behrt: transformer for electronic health records. [Paper] [Code]
[BioNLP] BioBART: Pretraining and evaluation of a biomedical generative language model. [Paper] [Code]

2019

[NPJ Digit. Med.] ClinicalBERT: A hybrid learning model for natural language inference in healthcare using BERT. [Paper] [Code]
[Method. Biochem. Anal.] Biobert: a pre-trained biomedical language representation model for biomedical text mining. [Paper] [Code]

VFM methods

2024

[arXiv] USFM: A universal ultrasound foundation model generalized to tasks and organs towards label efficient image analysis. [paper]
[CVPR] VoCo: A simple-yet-effective volume contrastive learning framework for 3D medical image analysis. [paper][Code]
[NeurIPS] LVM-Med: Learning large-scale self-supervised vision models for medical imaging via second-order graph matching. [paper] [Code]
[Nature Medicine] Towards a general-purpose foundation model for computational pathology. [paper] [Code]
[arXiv] RudolfV: A foundation model by pathologists for pathologists. [paper] [Code]
[Nature Communications] Segment anything in medical images. [paper] [Code]
[ICASSP] SAM-OCTA: A fine-tuning strategy for applying foundation model to OCTA image segmentation tasks.[paper] [Code]
[WACV] AFTer-SAM: Adapting SAM with axial fusion transformer for medical imaging segmentation. [paper]
[MIDL] Adaptivesam: Towards efficient tuning of sam for surgical scene segmentation. [paper] [Code]
[arXiv] SegmentAnyBone: A universal model that segments any bone at any location on MRI [paper] [Code]
[SSRN] Swinsam: Fine-grained polyp segmentation in colonoscopy images via segment anything model integrated with a Swin transformer decoder. [paper]
[AAAI] Surgicalsam: Efficient class promptable surgical instrument segmentation [paper] [Code]
[Medical Image Analysis] Prompt tuning for parameter-efficient medical image segmentation. [paper] [Code]

2023

[ICCV] UniverSeg: Universal medical image segmentation. [paper] [Code]
[arXiv] STU-Net: Scalable and transferable medical image segmentation models empowered by large-scale supervised pre-training. [paper] [Code]
[arXiv] SAM-Med3D. [paper] [Code]
[Nature] A foundation model for generalizable disease detection from retinal images. [paper]
[arXiv] VisionFM: a multi-modal multi-task vision foundation model for generalist ophthalmic Artificial Intelligence. [paper]
[arXiv] Segvol: Universal and interactive volumetric medical image segmentation. [paper] [Code]
[MICCAI] Models Genesis: Generic autodidactic models for 3D medical image analysis. [paper] [Code]
[MICCAI] Deblurring masked autoencoder is better recipe for ultrasound image recognition. [paper] [Code]
[arXiv] Mis-fm: 3d medical image segmentation using foundation models pretrained on a large-scale unannotated dataset. [paper] [Code]
[MICCAI] Foundation model for endoscopy video analysis via large-scale self-supervised pre-train. [paper][Code]
[MIDL] MoCo pretraining improves representation and transferability of chest X-ray models. [paper] [Code]
[arXiv] BROW: Better features for whole slide image based on self-distillation[paper]
[arXiv] Computational pathology at health system scale--self-supervised foundation models from three billion images. [paper]
[CVPR] Geometric visual similarity learning in 3D medical image self-supervised pre-training.[paper] [Code]
[arXiv] Virchow: A million-slide digital pathology foundation model.[paper] [Code]
[arXiv] Ma-sam: Modality-agnostic sam adaptation for 3d medical image segmentation.[paper] [Code]
[ICCV] Comprehensive multimodal segmentation in medical imaging: combining YOLOv8 with SAM and HQ-SAM models. [paper]
[arXiv] 3DSAM-adapter: Holistic adaptation of SAM from 2D to 3D for promptable medical image segmentation.[paper] [Code]
[arXiv] Part to whole: Collaborative prompting for surgical instrument segmentation. [paper] [Code]
[arXiv] Towards general purpose vision foundation models for medical image analysis: An experimental study of DINOv2 on radiology benchmarks.[paper] [Code]
[arXiv] Skinsam: Empowering skin cancer segmentation with segment anything model.[paper]
[arXiv] Polyp-sam: Transfer sam for polyp segmentation. [paper] [Code]
[arXiv] Customized segment anything model for medical image segmentation. [paper] [Code]
[arXiv] Ladder fine-tuning approach for SAM integrating complementary network. [paper] [Code]
[arXiv] Cheap lunch for medical image segmentation by fine-tuning sam on few exemplars. [paper]
[arXiv] SemiSAM: Exploring SAM for enhancing semi-supervised medical image segmentation with extremely limited annotations. [paper]
[IWMLMI] Mammo-sam: Adapting foundation segment anything model for automatic breast mass segmentation in whole mammograms. [paper]
[arXiv] Promise: Prompt-driven 3D medical image segmentation using pretrained image foundation models. [paper] [Code]
[arXiv] Medical sam adapter: Adapting segment anything model for medical image segmentation. [paper] [Code]
[arXiv] SAM-Med2D [paper] [Code]
[arXiv] Medivista-sam: Zero-shot medical video analysis with spatio-temporal sam adaptation. [paper] [Code]
[arXiv] Samus: Adapting segment anything model for clinically-friendly and generalizable ultrasound image segmentation. [paper]
[MICCAI] Input augmentation with sam: Boosting medical image segmentation with segmentation foundation model. [paper] [Code]
[arXiv] AutoSAM: Adapting SAM to medical images by overloading the prompt encoder. [paper]
[arXiv] DeSAM: Decoupling segment anything model for generalizable medical image segmentation [paper] [Code]
[bioRxiv] A foundation model for cell segmentation.[paper] [Code]
[MICCAI] SAM-U: Multi-box prompts triggered uncertainty estimation for reliable SAM in medical image. [paper]
[MICCAI] Sam-path: A segment anything model for semantic segmentation in digital pathology. [paper]
[arXiv] All-in-sam: from weak annotation to pixel-wise nuclei segmentation with prompt-based finetuning.[paper]
[arXiv] Polyp-sam++: Can a text guided sam perform better for polyp segmentation? [paper] [Code]
[arXiv] Segment anything model with uncertainty rectification for auto-prompting medical image segmentation. [paper]
[arXiv] MedLSAM: Localize and segment anything model for 3D medical images. [paper] [Code]
[arXiv] nnSAM: Plug-and-play segment anything model improves nnUNet performance. [paper] [Code]
[arXiv] EviPrompt: A training-free evidential prompt generation method for segment anything model in medical images. [paper]
[arXiv] One-shot localization and segmentation of medical images with foundation models. [paper]
[arXiv] Samm (segment any medical model): A 3d slicer integration to sam. [paper] [Code]
[arXiv] Task-driven prompt evolution for foundation models.[paper]

2022

[Machine Learning with Applications] Self supervised contrastive learning for digital histopathology. [paper] [Code]
[Medical Image Analysis] Transformer-based unsupervised contrastive learning for histopathological image classification. [paper] [Code]
[arXiv] Self-supervised learning from 100 million medical images. [paper]
[CVPR] Self-supervised pre-training of swin transformers for 3d medical image analysis.[paper] [Code]

2021

[Medical Image Analysis] Models genesis. [paper] [Code]
[Medical Imaging with Deep Learning] MoCo pretraining improves representation and transferability of chest X-ray models. [paper]
[IEEE transactions on medical imaging] Transferable visual words: Exploiting the semantics of anatomical patterns for self-supervised learning.[Paper]

2020

[MICCAI] Comparing to learn: Surpassing imageNet pretraining on radiographs by comparing image representations. [paper] [Code]

2019

[arXiv] Med3D: Transfer learning for 3D medical image analysis. [paper] [Code]

BFM methods

2024

[Nucleic Acids Research] Multiple sequence alignment-based RNA language model and its application to structural inference. [Paper], [Code]
[Nature Methods] scGPT: toward building a foundation model for single-cell multi-omics using generative AI. [Paper], [Code]
[Nature Machine Intelligence] A 5’ UTR language model for decoding untranslated regions of mRNA and function predictions. [Paper], [Code]
[ICLR 2024] CellPLM: Pre-training of Cell Language Model Beyond Single Cells. [Paper], [Code] 2023
[arXiv] DNAGPT: A generalized pre-trained tool for versatile DNA sequence analysis tasks. [Paper], [Code]
[arXiv] HyenaDNA: Long-range genomic sequence modeling at single nucleotide resolution. [Paper], [Code]
[Nature Biotechnology] Large language models generate functional protein sequences across diverse families. [Paper], [Code]
[Cell Systems] ProGen2: Exploring the boundaries of protein language models. [Paper], [Code]
[Nature] Transfer learning enables predictions in network biology. [Paper], [Code]
[arXiv] DNABERT-2: Efficient foundation model and benchmark for multi-species genome. [Paper], [Code]
[bioRxiv] The nucleotide transformer: Building and evaluating robust foundation models for human genomics. [Paper], [Code]
[bioRxiv] GENA-LM: A family of open-source foundational models for long DNA sequences. [Paper], [Code]
[bioRxiv] Self-supervised learning on millions of pre-mRNA sequences improves sequence-based RNA splicing prediction. [Paper], [Code]
[bioRxiv] Deciphering 3’ UTR mediated gene regulation using interpretable deep representation learning. [Paper], [Code]
[Science] Evolutionary-scale prediction of atomic-level protein structure with a language model. [Paper], [Code]
[bioRxiv] Universal cell embeddings: A foundation model for cell biology. [Paper], [Code]
[bioRxiv] Large scale foundation model on single-cell transcriptomics. [Paper], [Code]
[arXiv] Large-scale cell representation learning via divide-and-conquer contrastive learning. [Paper], [Code]
[bioRxiv] CodonBERT: Large language models for mRNA design and optimization. [Paper], [Code]
[bioRxiv] xTrimoPGLM: Unified 100B-scale pre-trained transformer for deciphering the language of protein. [Paper]
[bioRxiv] GenePT: A simple but effective foundation model for genes and cells built from ChatGPT. [Paper], [Code]
[bioRxiv] scELMo: Embeddings from language models are good learners for single-cell data analysis. [Paper], [Code]
[bioRxiv] Evaluating the Utilities of Foundation Models in Single-cell Data Analysis. [Paper], [Code]
[bioRxiv] GeneCompass: Deciphering Universal Gene Regulatory Mechanisms with Knowledge-Informed Cross-Species Foundation Model. [Paper], [Code]

2022

[Nature Machine Intelligence] scBERT as a large-scale pretrained deep language model for cell type annotation of single-cell RNA-seq data. [Paper], [Code]
[bioRxiv] Interpretable RNA foundation model from unannotated data for highly accurate RNA structure and function predictions. [Paper], [Code]
[NAR Genomics & Bioinformatics] Informative RNA base embedding for RNA structural alignment and clustering by deep representation learning. [Paper], [Code]
[Nature Biotechnology] Single-sequence protein structure prediction using language models and deep learning. [Paper], [Code]

2021

[Bioinformatics] DNABERT: pre-trained Bidirectional Encoder Representations from Transformers model for DNA-language in genome. [Paper], [Code]
[IEEE TPAMI] ProtTrans: Toward understanding the language of life through self-supervised learning. [Paper], [Code]
[ICML 2021] MSA Transformer. [Paper], [Code]
[PNAS] Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. [Paper], [Code]
[Nature] Highly accurate protein structure prediction with AlphaFold. [Paper], [Code]
[arXiv] Multi-modal self-supervised pre-training for regulatory genome across cell types. [Paper], [Code]

MFM methods

2024

[ICASSP] Etp: Learning transferable ecg representations via ecg-text pretraining. [Paper]
[NeurIPS] Med-unic: Unifying cross-lingual medical vision language pre-training by diminishing bias. [Paper] [Code]
[NeurIPS] Quilt-1m: One million image-text pairs for histopathology. [Paper] [Code]
[Nature Medicine] A visual-language foundation model for computational pathology. [Paper]
[NeurIPS] LLaVA-Med: Training a large language-and-vision assistant for biomedicine in one day. [Paper] [Code]
[AAAI] PathAsst: Generative foundation AI assistant for pathology. [Paper] [Code]
[WACV] I-AI: A controllable & interpretable AI system for decoding radiologists’ intense focus for accurate CXR diagnoses. [Paper] [Code]
[arXiv] M3D: Advancing 3D medical image analysis with multi-modal large language models. [Paper] [Code]

2023

[ICLR] Advancing radiograph representation learning with masked record modeling. [Paper] [Code]
[arXiv] BiomedGPT: A unified and generalist biomedical generative pre-trained transformer for vision, language, and multimodal Tasks. [Paper] [Code]
[arXiv] BiomedCLIP: a multimodal biomedical foundation model pretrained from fifteen million scientific image-text pairs. [Paper] [Code]
[arXiv] Towards generalist foundation model for radiology by leveraging web-scale 2D&3D medical data. [Paper] [Code]
[CVPR] Visual language pretrained multiple instance zero-shot transfer for histopathology images. [Paper] [Code]
[ICCV] Medklip: Medical knowledge enhanced language-image pre-training. [Paper] [Code]
[arXiv] UniBrain: Universal brain MRI diagnosis with hierarchical knowledge-enhanced pre-training. [Paper] [Code]
[EACL] PubMedCLIP: How much does CLIP benefit visual question answering in the medical domain. [Paper] [Code]
[MICCAI] M-FLAG: Medical vision-language pre-training with frozen language models and latent space geometry optimization. [Paper] [Code]
[arXiv] IMITATE: Clinical prior guided hierarchical vision-language pre-training. [Paper]
[arXiv] CXR-CLIP: Toward large scale chest X-ray language-image pre-training. [Paper] [Code]
[BIBM] UMCL: Unified medical image-text-label contrastive learning with continuous prompt. [Paper]
[Nature Communications] Knowledge-enhanced visual-language pre-training on chest radiology images. [Paper]
[Nature Machine Intelligence] Multi-modal molecule structure–text model for text-based retrieval and editing. [Paper] [Code]
[MICCAI] Clip-lung: Textual knowledge-guided lung nodule malignancy prediction. [Paper]
[MICCAI] Pmc-clip: Contrastive language-image pre-training using biomedical documents. [Paper] [Code]
[arXiv] Enhancing representation in radiography-reports foundation model: A granular alignment algorithm using masked contrastive learning. [Paper] [Code]
[ICCV] Prior: Prototype representation joint learning from medical images and reports. [Paper] [Code]
[MICCAI] Masked vision and language pre-training with unimodal and multimodal contrastive losses for medical visual question answering. [Paper] [Code]
[arXiv] T3d: Towards 3d medical image understanding through vision-language pre-training. [Paper]
[MICCAI] Gene-induced multimodal pre-training for imageomic classification. [Paper] [Code]
[arXiv] A text-guided protein design framework. [Paper] [Code]
[Nature Medicine] A visual--language foundation model for pathology image analysis using medical Twitter. [Paper] [Code]
[arXiv] Towards generalist biomedical ai. [Paper] [Code]
[ML4H] Med-Flamingo: A multimodal medical few-shot learner. [Paper] [Code]
[MLMIW] Exploring the transfer learning capabilities of CLIP on domain generalization for diabetic retinopathy. [Paper] [Code]
[MICCAI] Open-ended medical visual question answering through prefix tuning of language models. [Paper] [Code]
[arXiv] Qilin-Med-VL: Towards chinese large vision-language model for general healthcare. [Paper] [Code]
[arXiv] A foundational multimodal vision language AI assistant for human pathology. [Paper]
[arXiv] Effectively fine-tune to improve large multimodal models for radiology report generation. [Paper]
[MLMIW] Multi-modal adapter for medical vision-and-language learning. [Paper]
[arXiv] Text-guided foundation model adaptation for pathological image classification. [Paper] [Code]
[arXiv] XrayGPT: Chest radiographs summarization using medical vision-language models. [Paper] [Code]
[MICCAI] Xplainer: From X-Ray observations to explainable zero-shot diagnosis. [Paper] [Code]
[MICCAI] Multiple prompt fusion for zero-shot lesion detection using vision-language models. [Paper]

2022

[JMLR] Contrastive learning of medical visual representations from paired images and text. [Paper] [Code]
[ECCV] Joint learning of localized representations from medical images and reports. [Paper]
[NeurIPS] Multi-granularity cross-modal alignment for generalized medical visual representation learning. [Paper] [Code]
[AAAI] Clinical-BERT: Vision-language pre-training for radiograph diagnosis and reports generation. [Paper]
[MICCAI] Multi-modal masked autoencoders for medical vision-and-language pre-training. [Paper] [Code]
[JBHI] Multi-modal understanding and generation for medical images and text via vision-language pre-training. [Paper] [Code]
[ACM MM] Align, reason and learn: Enhancing medical vision-and-language pre-training with knowledge. [Paper] [Code]
[ECCV] Making the most of text semantics to improve biomedical vision–language processing. [Paper]
[Nature Biomedical Engineering] Expert-level detection of pathologies from unannotated chest x-ray images via self-supervised learning. [Paper] [Code]
[arXiv] RoentGen: Vision-language foundation model for chest X-ray generation. [Paper]
[arXiv] Adapting pretrained vision-language foundational models to medical imaging domains. [Paper]
[arXiv] Medical image understanding with pretrained vision language models: A comprehensive study. [Paper]
[EMNLP] Medclip: Contrastive learning from unpaired medical images and text. [Paper] [Code]
[MICCAI] Breaking with fixed set pathology recognition through report-guided contrastive training. [Paper]

2021

[arXiv] MMBERT: Multimodal BERT pretraining for improved medical VQA. [Paper] [Code]
[ICCV] GLoRIA: A multimodal global-local representation learning framework for label-efficient medical image recognition. [Paper] [Code]

Datasets

LFM datasets

Dataset Name	Text Types	Scale	Task	Link
PubMed	Literature	18B tokens	Language modeling	*
MedC-I	Literature	79.2B tokens	Dialogue	*
Guidelines	Literature	47K instances	Language modeling	*
PMC-Patients	Literature	167K instances	Information retrieval	*
MIMIC-III	Health records	122K instances	Language modeling	*
MIMIC-IV	Health record	299K instances	Language modeling	*
eICU-CRDv2.0	Health record	200K instances	Language modeling	*
EHRs	Health record	82B tokens	Named entity recognition, Relation extraction, Semantic textual similarity, Natural language inference, Dialogue	-
MD-HER	Health record	96K instances	Dialogue, Question answering	-
IMCS-21	Dialogue	4K instances	Dialogue	*
Huatuo-26M	Dialogue	26M instances	Question answering	*
MedInstruct-52k	Dialogue	52K instances	Dialogue	*
MASH-QA	Dialogue	35K instances	Dialogue	*
MedQuAD	Dialogue	47K instances	Dialogue	*
MedDG	Dialogue	17K instances	Dialogue	*
CMExam	Dialogue	68K instances	Dialogue	*
cMedQA2	Dialogue	108K instances	Dialogue	*
CMtMedQA	Dialogue	70K instances	Dialogue	*
CliCR	Dialogue	100K instances	Dialogue	*
webMedQA	Dialogue	63K instances	Dialogue	*
ChiMed	Dialogue	1.59B tokens	Dialogue	*
MedDialog	Dialogue	20K instances	Dialogue	*
CMD	Dialogue	882K instances	Dialogue	*
BianqueCorpus	Dialogue	2.4M instances	Dialogue	*
MedQA	Dialogue	4K instances	Dialogue	*
HealthcareMagic	Dialogue	100K instances	Dialogue	*
iCliniq	Dialogue	10K instances	Dialogue	*
CMeKG-8K	Dialogue	8K instances	Dialogue	*
Hybrid SFT	Dialogue	226K instances	Dialogue	*
VariousMedQA	Dialogue	54K instances	Dialogue	*
Medical Meadow	Dialogue	160K instances	Dialogue	*
MultiMedQA	Dialogue	193K instances	Dialogue	-
BiMed1.3M	Dialogue	250K instances	Dialogue	*
OncoGPT	Dialogue	180K instances	Dialogue	*

VFM datasets

Dataset Name	Modality	Scale	Task	Link
LIMUC	Endoscopy	1043 videos (11276 frames)	Detection	*
SUN	Endoscopy	1018 videos (158,690 frames)	Detection	*
Kvasir-Capsule	Endoscopy	117 videos (4,741,504 frames)	Detection	*
EndoSLAM	Endoscopy	1020 videos (158,690 frames)	Detection, Registration	*
LDPolypVideo	Endoscopy	263 videos (895,284 frames)	Detection	*
HyperKvasir	Endoscopy	374 videos (1,059,519 frames)	Detection	*
CholecT45	Endoscopy	45 videos (90489 frames)	Segmentation, Detection	*
DeepLesion	CT slices (2D)	32,735 images	Segmentation, Registration	*
LIDC-IDRI	3D CT	1,018 volumes	Segmentation	*
TotalSegmentator	3D CT	1,204 volumes	Segmentation	*
TotalSegmentatorv2	3D CT	1,228 volumes	Segmentation	*
AutoPET	3D CT, 3D PET	1,214 PET-CT pairs	Segmentation	*
ULS	3D CT	38,842 volumes	Segmentation	*
FLARE 2022	3D CT	2,300 volumes	Segmentation	*
FLARE 2023	3D CT	4,500 volumes	Segmentation	*
AbdomenCT-1K	3D CT	1,112 volumes	Segmentation	*
CTSpine1K	3D CT	1,005 volumes	Segmentation	*
CTPelvic1K	3D CT	1,184 volumes	Segmentation	*
MSD	3D CT, 3D MRI	1,411 CT, 1,222 MRI	Segmentation	*
BraTS21	3D MRI	2,040 volumes	Segmentation	*
BraTS2023-MEN	3D MRI	1,650 volumes	Segmentation	*
ADNI	3D MRI	-	Clinical study	*
PPMI	3D MRI	-	Clinical study	*
ATLAS v2.0	3D MRI	1,271 volumes	Segmentation	*
PI-CAI	3D MRI	1,500 volumes	Segmentation	*
MRNet	3D MRI	1,370 volumes	Segmentation	*
Retinal OCT-C8	2D OCT	24,000 volumes	Classification	*
Ultrasound Nerve Segmentation	US	11,143 images	Segmentation	*
Fetal Planes	US	12,400 images	Classification	*
EchoNet-LVH	US	12,000 videos	Detection, Clinical study	*
EchoNet-Dynamic	US	10,030 videos	Function assessment	*
AIROGS	CFP	113,893 images	Classification	*
ISIC 2020	Dermoscopy	33,126 images	Classification	*
LC25000	Pathology	25,000 images	Classification	*
DeepLIIF	Pathology	1,667 WSIs	Classification	*
PAIP	Pathology	2,457 WSIs	Segmentation	*
TissueNet	Pathology	1,016 WSIs	Classification	*
NLST	3D CT, Pathology	26,254 CT, 451 WSIs	Clinical study	*
CRC	Pathology	100k images	Classification	*
MURA	X-ray	40,895 images	Detection	*
ChestX-ray14	X-ray	112,120 images	Detection	*
SNOW	Synthetic pathology	20K image tiles	Segmentation	*

BFM datasets

Dataset Name	Modality	Scale	Task	Link
CellxGene Corpus	scRNA-seq	over 72M scRNA-seq data	Single cell omics study	*
NCBI GenBank	DNA	3.7B sequences	Genomics study	*
SCP	scRNA-seq	over 40M scRNA-seq data	Single cell omics study	*
Gencode	DNA		Genomics study	*
10x Genomics	scRNA-seq, DNA		Single cell omics and genomics study	*
ABC Atlas	scRNA-seq	over 15M scRNA-seq data	Single cell omics study	*
Human Cell Atlas	scRNA-seq	over 50M scRNA-seq data	Single cell omics study	*
UCSC Genome Browser	DNA		Genomics study	*
CPTAC	DNA, RNA, protein	-	Genomics and proteomics study	*
Ensembl Project	Protein		Proteomics study	*
RNAcentral database	RNA	36M sequences	Transcriptomics study	*
AlphaFold DB	Protein	214M structures	Proteomics study	*
PDBe	Protein		Proteomics study	*
UniProt	Protein	over 250M sequences	Proteomics study	*
LINCS L1000	Small molecules	1,000 genes with 41k small molecules	Disease research, drug response	*
GDSC	Small molecules	1,000 cancer cells with 400 compounds	Disease research, drug response	*
CCLE			Bioinformatics study	*

MFM datasets

Dataset Name	Modalities	Scale	Task	Link
MIMIC-CXR	X-ray, Medical report	377K images, 227K texts	Vision-Language Learning	*
PadChest	X-ray, Medical report	160K images, 109K texts	Vision-Language Learning	*
CheXpert	X-ray, Medical report	224K images, 224K texts	Vision-Language Learning	*
ImageCLEF2018	Multimodal, Captions	232K images, 232K texts	Image captioning	*
OpenPath	Pathology, Tweets	208K images, 208K texts	Vision-Language learning	*
PathVQA	Pathology, QA	4K images, 32K QA pairs	VQA	*
Quilt-1M	Pathology Images, Mixed-source text	1M images, 1M texts	Vision-Language learning	*
PatchGastricADC22	Pathology, Captions	991 WSIs, 991 texts	Image captioning	*
PTB-XL	ECG, Medical report	21K records, 21K texts	Vision-Language learning	*
ROCO	Multimodal, Captions	87K images, 87K texts	Vision-Language learning	*
MedICaT	Multimodal, Captions	217K images, 217K texts	Vision-Language learning	*
PMC-OA	Multimodal, Captions	1.6M images, 1.6M texts	Vision-Language learning	*
ChiMed-VL	Multimodal, Medical report	580K images, 580K texts	Vision-Language learning	*
PMC-VQA	Multimodal, QA	149K images, 227K QA pairs	VQA	*
SwissProtCLAP	Protein Sequence, Text	441K protein sequence, 441K texts	Protein-Language learning	*
Duke Breast Cancer MRI	Genomic, MRI images, Clinical data	922 patients	Multimodal learning	*
I-SPY2	MRI images, Clinical data	719 patients	Multimodal learning	*

Large-scale comprehensive databases

Database	Discription	Link
CGGA	Chinese Glioma Genome Atlas (CGGA) database contains clinical and sequencing data of over 2,000 brain tumor samples from Chinese cohorts.	*
UK Biobank	UK Biobank is a large-scale biomedical database and research resource containing de-identified genetic, lifestyle and health information and biological samples from half a million UK participants.	*
TCGA	The Cancer Genome Atlas program (TCGA) molecularly characterizes over 20,000 primary cancer, matches normal samples spanning 33 cancer types, and generates over 2.5 petabytes of genomic, epigenomic, transcriptomic, and proteomic data.	*
TCIA	The Cancer Imaging Archive (TCIA) is a service which de-identifies and hosts a large publicly available archive of medical images of cancer.	*

Awesome

Awesome-Foundation-Models-for-Advancing-Healthcare

Contents

Related survey

Methods

LFM methods

VFM methods

BFM methods

MFM methods

Datasets

LFM datasets

VFM datasets

BFM datasets

MFM datasets

Large-scale comprehensive databases

Other resources

Lectures and tutorials

Blogs

Related awesome repositories