Awesome

Awesome-Healthcare-Foundation-Models

Curated list of awesome large AI models (LAMs), or foundation models, in healthcare. We organize the current LAMs into four categories: large language models (LLMs), large vision models (LVMs), large audio models, and large multi-modal models (LMMs). The areas that these LAMs are applied to include but not limited to bioinformatics, medical diagnosis, medical imaging, medical informatics, medical education, public health, and medical robotics.

We welcome contributions to this repository to add more resources. Please submit a pull request if you want to contribute!

News

We are excited to annouce a IEEE J-BHI special issue on Biomedical and Health Foundation Models. Please refer to the call-for-papers for more details.

Topics of interest include but not limited to:

Basic research on new theories, principles, and structures of biomedical and health foundation models
Basic research on the interpretability and explainability of biomedical and health foundation models
Prompt engineering in biomedical and health foundation models
Data engineering in biomedical and health foundation models
Large-scale biomedical and health dataset
Multi-modal learning and alignment for biomedical and health foundation models
Efficient computing for biomedical and health foundation models
Adversarial robustness of biomedical and health foundation models
Applications of foundation models in biomedical and health informatics
New evaluation paradigms for biomedical and health foundation models
New computer systems for biomedical and health foundation models
Decentralised methods for developing and deploying biomedical and health foundation models
Foundation model ethics, safety, privacy, and regulations in biomedicine and healthcare

Please help spread the word and contribute if you are interested or already working on these topics!

Awesome-Healthcare-Foundation-Models

Survey

This repository is largely based on the following paper:

Large AI Models in Health Informatics: Applications, Challenges, and the Future <br /> Jianing Qiu, Lin Li, Jiankai Sun, Jiachuan Peng, Peilun Shi, Ruiyang Zhang, Yinzhao Dong, Kyle Lam, Frank P.-W. Lo, Bo Xiao, Wu Yuan, Ningli Wang, Dong Xu, and Benny Lo <br />

If you find this repository helpful, please consider citing:

@article{qiu2023large,
  title={Large ai models in health informatics: Applications, challenges, and the future},
  author={Qiu, Jianing and Li, Lin and Sun, Jiankai and Peng, Jiachuan and Shi, Peilun and Zhang, Ruiyang and Dong, Yinzhao and Lam, Kyle and Lo, Frank P-W and Xiao, Bo and others},
  journal={IEEE Journal of Biomedical and Health Informatics},
  year={2023},
  publisher={IEEE}
}

Large Language Models

Healthcare Domain

ClinicalMamba: A Generative Clinical Language Model on Longitudinal Clinical Notes [Paper]
ChiMed-GPT: A Chinese Medical Large Language Model with Full Training Regime and Better Alignment to Human Preferences [Paper] [Code]
Med-PaLM 2: Towards Expert-Level Medical Question Answering with Large Language Models [Paper]
KeBioLM: Improving Biomedical Pretrained Language Models with Knowledge [Paper]
BioELMo: Probing Biomedical Embeddings from Language Models [Paper]
BioBART: Pretraining and Evaluation of A Biomedical Generative Language Model [Paper]
ClinicalT5: A Generative Language Model for Clinical Text [Paper]
GatorTron: A Large Clinical Language Model to Unlock Patient Information from Unstructured Electronic Health Records [Paper]
ChatCAD: Interactive Computer-Aided Diagnosis on Medical Image using Large Language Models [Paper] [Code]
DeID-GPT: Zero-shot Medical Text De-Identification by GPT-4 [Paper]
Capabilities of GPT-4 on Medical Challenge Problems [Paper]
BioBERT: a pre-trained biomedical language representation model for biomedical text mining [Paper]
Publicly Available Clinical BERT Embeddings [Paper]
BioMegatron: Larger Biomedical Domain Language Model [Paper]
Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks [Paper]
Med-BERT: pretrained contextualized embeddings on large-scale structured electronic health records for disease prediction [Paper]
CPLLM: Clinical Prediction with Large Language Models [Paper] [Code]
DoctorGLM: Fine-tuning your chinese doctor is not a herculean task [Paper] [Code]
HuatuoGPT, Towards Taming Language Models To Be a Doctor [Paper] [Code]
BioELECTRA:Pretrained Biomedical text Encoder using Discriminators [Paper]
LinkBERT: Pretraining Language Models with Document Links [Paper]
BioGPT: Generative Pre-trained Transformer for Biomedical Text Generation and Mining [Paper]
Large Language Models Encode Clinical Knowledge [Paper]
A large language model for electronic health records [Paper]
Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing [Paper]
BEHRT: Transformer for Electronic Health Records [Paper]
Federated Learning of Medical Concepts Embedding using BEHRT [Paper] [Code]
RadBERT: Adapting Transformer-based Language Models to Radiology [paper] [HuggingFace]
Highly accurate protein structure prediction with AlphaFold [Paper] [Code]
Accurate prediction of protein structures and interactions using a three-track neural network [Paper]
Protein complex prediction with AlphaFold-Multimer [Paper]
FastFold: Reducing AlphaFold Training Time from 11 Days to 67 Hours [Paper] [Code]
HelixFold: An Efficient Implementation of AlphaFold2 using PaddlePaddle [Paper] [Code]
Uni-Fold: An Open-Source Platform for Developing Protein Folding Models beyond AlphaFold [Paper] [Code]
OpenFold: Retraining AlphaFold2 yields new insights into its learning mechanisms and capacity for generalization [Paper] [Code]
ManyFold: an efficient and flexible library for training and validating protein folding models [Paper] [Code]
ColabFold: making protein folding accessible to all [Paper] [Code]
Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences [Paper] [Code]
ProGen: Language Modeling for Protein Generation [Paper] [Code]
ProtTrans: Towards Cracking the Language of Life's Code Through Self-Supervised Deep Learning and High Performance Computing [Paper] [Code]
Evolutionary-scale prediction of atomic level protein structure with a language model [Paper]
High-resolution de novo structure prediction from primary sequence [Paper] [Code]
Single-sequence protein structure prediction using a language model and deep learning [Paper]
Improved the Protein Complex Prediction with Protein Language Models [Paper]
MSA Transformer [Paper] [Code]
Deciphering antibody affinity maturation with language models and weakly supervised learning [Paper]
xTrimoABFold: De novo Antibody Structure Prediction without MSA [Paper]
scBERT as a large-scale pretrained deep language model for cell type annotation of single-cell RNA-seq data [Paper] [Code]
Interpretable RNA Foundation Model from Unannotated Data for Highly Accurate RNA Structure and Function Predictions [Paper] [Code]
E2Efold-3D: End-to-End Deep Learning Method for accurate de novo RNA 3D Structure Prediction [Paper] [Code]
HyenaDNA: Long-Range Genomic Sequence Modeling at Single Nucleotide Resolution [Paper] [Code]

General Domain

Chatgpt: Optimizing language models for dialogue [Blog]
LLaMA: Open and Efficient Foundation Language Models [Paper]
Scaling Instruction-Finetuned Language Models [Paper]
PaLM: Scaling Language Modeling with Pathways [Paper]
Training Compute-Optimal Large Language Models [Paper]
Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model [Paper]
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model [Paper]
LaMDA: Language Models for Dialog Applications [Paper]
OPT: Open Pre-trained Transformer Language Models [Paper]
Training language models to follow instructions with human feedback [Paper]
Scaling Language Models: Methods, Analysis & Insights from Training Gopher [Paper]
Multitask prompted training enables zero-shot task generalization [Paper]
Language Models are Few-Shot Learners [Paper]
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer [Paper]
RoBERTa: A Robustly Optimized BERT Pretraining Approach [Paper]
Language Models are Unsupervised Multitask Learners [Paper]
Improving language models by retrieving from trillions of tokens [Paper]
WebGPT: Browser-assisted question-answering with human feedback [Paper]
Improving alignment of dialogue agents via targeted human judgements [Paper]
Improving Language Understanding by Generative Pre-Training [Paper]
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding [Paper]

Large Vision Models

Healthcare Domain

VisionFM: A Multi-Modal Multi-Task Vision Foundation Model for Generalist Ophthalmic Artificial Intelligence [Paper]
RETFound: A foundation model for generalizable disease detection from retinal images [Paper]
EndoFM: Foundation Model for Endoscopy Video Analysis via Large-scale Self-supervised Pre-train [Paper] [Code]
STU-Net: Scalable and Transferable Medical Image Segmentation Models Empowered by Large-Scale Supervised Pre-training [Paper]
LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical Imaging via Second-order Graph Matching [Paper] [Code]
Med3d: Transfer learning for 3d medical image analysis [Paper] [Code]
Models genesis: Generic autodidactic models for 3d medical image analysis [Paper] [Code]
MICLe: Big self-supervised models advance medical image classifications [Paper] [Code]
C2l: Comparing to Learn: Surpassing ImageNet Pretraining on Radiographs By Comparing Image Representations [Paper] [Code]
MoCo-CXR: MoCo Pretraining Improves Representation and Transferability of Chest X-ray Models [Paper] [Code]
Transunet: Transformers make strong encoders for medical image segmentation [Paper] [Code]
Transfuse: Fusing transformers and cnns for medical image segmentation [Paper] [Code]
Medical transformer: Gated axial-attention for medical image segmentation [Paper] [Code]
UNETR: Transformers for 3D Medical Image Segmentation [Paper] [Code]
Cotr: Efficiently bridging cnn and transformer for 3d medical image segmentation [Paper] [Code]
Swin-unet: Unet-like pure transformer for medical image segmentation [Paper] [Code]
SAM4Med: Generalist Vision Foundation Models for Medical Imaging: A Case Study of Segment Anything Model on Zero-Shot Medical Segmentation [Paper]
Learning Multi-modal Representations by Watching Hundreds of Surgical Video Lectures[Paper] [Code]

General Domain

CNNs:

GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism [paper]
Big Transfer (BiT): General Visual Representation Learning [paper]
Designing Network Design Spaces [paper]
Self-supervised Pretraining of Visual Features in the Wild [paper]
EfficientNetV2: Smaller Models and Faster Training [paper]
A ConvNet for the 2020s [paper]
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions [paper]

Vision Transformers:

Generative Pretraining From Pixels [paper]
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale [paper]
Transformer in Transformer [paper]
Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows [paper]
Training data-efficient image transformers & distillation through attention [paper]
Self-supervised Models are Good Teaching Assistants for Vision Transformers [paper]
Scaling Vision with Sparse Mixture of Experts [paper]
Going Deeper With Image Transformers [paper]
Masked Autoencoders Are Scalable Vision Learners [paper]
Swin Transformer V2: Scaling Up Capacity and Resolution [paper]
Scaling Vision Transformers [paper]
Efficient Self-supervised Vision Transformers for Representation Learning [paper]
Scaling Vision Transformers to 22 Billion Parameters [paper]

CNNs + ViTs:

CoAtNet: Marrying Convolution and Attention for All Data Sizes [paper]
LeViT: A Vision Transformer in ConvNet's Clothing for Faster Inference [paper]
ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases [paper]

Large Audio Models

Healthcare Domain

General Domain

wav2vec: Unsupervised Pre-training for Speech Recognition [Paper] [Blog]
W2v-BERT: Combining Contrastive Learning and Masked Language Modeling for Self-Supervised Speech Pre-Training [Paper]
AudioLM: a Language Modeling Approach to Audio Generation [Paper] [Project] [Blog]
HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units [Paper] [HuggingFace]
XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale [Paper] [Blog] [HuggingFace]
MusicLM: Generating Music From Text [Paper] [Project] [Code]
Diffsound: Discrete Diffusion Model for Text-to-sound Generation [Paper] [Project] [Code]
AudioGen: Textually Guided Audio Generation [Paper] [Project]
Whisper: Robust Speech Recognition via Large-Scale Weak Supervision [Paper] [Code] [HuggingFace]
Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages [Paper] [Blog]

Large Multi-modal Models

Healthcare Domain

The application of multimodal large language models in medicine [Paper]
Foundation models: the future of surgical artificial intelligence? [Paper]
Bootstrapping Large Language Models for Radiology Report Generation [Paper][Code]
Dietary Assessment with Multimodal ChatGPT: A Systematic Analysis [Paper]
PLIP: A visual–language foundation model for pathology image analysis using medical Twitter [Paper]
LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day [Paper]
GPT-4 Technical Report [Paper]
Expert-level detection of pathologies from unannotated chest X-ray images via self-supervised learning [Paper]
Contrastive Learning of Medical Visual Representations from Paired Images and Text [Paper] [Code]
Gloria: A multimodal global-local representation learning framework for labelefficient medical image recognition [Paper] [Code]
RAMM: Retrieval-augmented Biomedical Visual Question Answering with Multi-modal Pre-training [Paper]
PubMedCLIP: How Much Does CLIP Benefit Visual Question Answering in the Medical Domain? [Paper]
SurgVLP: Learning Multi-modal Representations by Watching Hundreds of Surgical Video Lectures[Paper] [Code]

General Domain

Multi-modal Chatbot

The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision) [Paper]

Representation learning:

Learning Transferable Visual Models From Natural Language Supervision [paper]
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision [paper]
Florence: A New Foundation Model for Computer Vision [paper]
Grounded Language-Image Pre-Training [paper]
WenLan: Bridging Vision and Language by Large-Scale Multi-Modal Pre-Training [paper]
FLAVA: A Foundational Language and Vision Alignment Model [paper]
SimVLM: Simple Visual Language Model Pretraining with Weak Supervision [paper]
FILIP: Fine-grained Interactive Language-Image Pre-Training [paper]
Combined Scaling for Open-Vocabulary Image Classification [paper]
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation [paper]
PaLI: A Jointly-Scaled Multilingual Language-Image Model [paper]
Towards All-in-one Pre-training via Maximizing Multi-modal Mutual Information [paper]
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models [paper]
Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm [paper]
Language Is Not All You Need: Aligning Perception with Language Models [paper]
PaLM-E: An Embodied Multimodal Language Model [paper]
Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models [paper]

Text-to-image generation:

Zero-Shot Text-to-Image Generation [paper]
High-Resolution Image Synthesis With Latent Diffusion Models [paper]
Hierarchical Text-Conditional Image Generation with CLIP Latents [paper]
GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models [paper]
Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding [paper]
Scaling Autoregressive Models for Content-Rich Text-to-Image Generation [paper]

Applications of Large AI Models in Healthcare

Note that some of the following models were not targeted at healthcare applications initially but may have the potential to be transferred to the healthcare domain or inspire future development.

Bioinformatics

GeneGPT: Augmenting Large Language Models with Domain Tools for Improved Access to Biomedical Information [Paper]
Highly accurate protein structure prediction with AlphaFold [Paper] [Code]
Accurate prediction of protein structures and interactions using a three-track neural network [Paper]
Protein complex prediction with AlphaFold-Multimer [Paper]
FastFold: Reducing AlphaFold Training Time from 11 Days to 67 Hours [Paper] [Code]
HelixFold: An Efficient Implementation of AlphaFold2 using PaddlePaddle [Paper] [Code]
Uni-Fold: An Open-Source Platform for Developing Protein Folding Models beyond AlphaFold [Paper] [Code]
OpenFold: Retraining AlphaFold2 yields new insights into its learning mechanisms and capacity for generalization [Paper] [Code]
ManyFold: an efficient and flexible library for training and validating protein folding models [Paper] [Code]
ColabFold: making protein folding accessible to all [Paper] [Code]
Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences [Paper] [Code]
ProGen: Language Modeling for Protein Generation [Paper] [Code]
ProtTrans: Towards Cracking the Language of Life's Code Through Self-Supervised Deep Learning and High Performance Computing [Paper] [Code]
Evolutionary-scale prediction of atomic level protein structure with a language model [Paper]
High-resolution de novo structure prediction from primary sequence [Paper] [Code]
Single-sequence protein structure prediction using a language model and deep learning [Paper]
Improved the Protein Complex Prediction with Protein Language Models [Paper]
MSA Transformer [Paper] [Code]
Deciphering antibody affinity maturation with language models and weakly supervised learning [Paper]
xTrimoABFold: De novo Antibody Structure Prediction without MSA [Paper]
scBERT as a large-scale pretrained deep language model for cell type annotation of single-cell RNA-seq data [Paper] [Code]
Interpretable RNA Foundation Model from Unannotated Data for Highly Accurate RNA Structure and Function Predictions [Paper] [Code]
E2Efold-3D: End-to-End Deep Learning Method for accurate de novo RNA 3D Structure Prediction [Paper] [Code]
SMILES-BERT: large scale unsupervised pre-training for molecular property prediction [Paper] [Code]
SMILES Transformer: Pre-trained molecular fingerprint for low data drug discovery [Paper] [Code]
MolBert: Molecular representation learning with language models and domain-relevant auxiliary tasks [Paper] [Code]
AGBT: Algebraic graph-assisted bidirectional transformers for molecular property prediction [Paper] [Code]
GROVER: Self-supervised graph transformer on large-scale molecular data [Paper] [Code]
Molgpt: molecular generation using a transformer-decoder model [Paper] [Code]
A Model to Search for Synthesizable Molecules [Paper] [Code]
Transformer neural network for protein-specific de novo drug generation as a machine translation problem [Paper]
Deepconv-dti: Prediction of drug-target interactions via deep learning with convolution on protein sequences [Paper] [Code]
Graphdta: predicting drug–target binding affinity with graph neural networks [Paper] [Code]
Moltrans: molecular interaction transformer for drug–target interaction prediction [Paper] [Code]
Extracting Predictive Representations from Hundreds of Millions of Molecules [Paper] [Code]
ADMETlab 2.0: an integrated online platform for accurate and comprehensive predictions of ADMET properties [Project] [Paper]
MPG: Learn molecular representations from large-scale unlabeled molecules for drug discovery [Paper]
MG-BERT: leveraging unsupervised atomic representation learning for molecular property prediction [Paper] [Code]
PanGu Drug Model: Learn a Molecule Like a Human [Project] [Paper]
DrugBAN: Interpretable bilinear attention network with domain adaptation improves drug–target prediction [Paper] [Code]
DrugOOD: Out-of-Distribution (OOD) Dataset Curator and Benchmark for AI-aided Drug Discovery [Paper] [Code]

Medical Diagnosis

VisionFM: A Multi-Modal Multi-Task Vision Foundation Model for Generalist Ophthalmic Artificial Intelligence [Paper]
RETFound: A foundation model for generalizable disease detection from retinal images [Paper]
LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day [Paper]
Expert-level detection of pathologies from unannotated chest X-ray images via self-supervised learning [Paper]
ChatCAD: Interactive Computer-Aided Diagnosis on Medical Image using Large Language Models [Paper] [Code]
BEHRT: Transformer for Electronic Health Records [Paper]
Federated Learning of Medical Concepts Embedding using BEHRT [Paper] [Code]
Med-BERT: pretrained contextualized embeddings on large-scale structured electronic health records for disease prediction [Paper]
CPLLM: Clinical Prediction with Large Language Models [Paper] [Code]
RadBERT: Adapting Transformer-based Language Models to Radiology [paper] [HuggingFace]
ChatCAD+: Towards a Universal and Reliable Interactive CAD using LLMs [paper] [Code]

Medical Imaging

VisionFM: A Multi-Modal Multi-Task Vision Foundation Model for Generalist Ophthalmic Artificial Intelligence [Paper]
RETFound: A foundation model for generalizable disease detection from retinal images [Paper]
Expert-level detection of pathologies from unannotated chest X-ray images via self-supervised learning [Paper]
Med3d: Transfer learning for 3d medical image analysis [Paper] [Code]
Models genesis: Generic autodidactic models for 3d medical image analysis [Paper] [Code]
MICLe: Big self-supervised models advance medical image classifications [Paper] [Code]
C2l: Comparing to Learn: Surpassing ImageNet Pretraining on Radiographs By Comparing Image Representations [Paper] [Code]
ConVIRT: Contrastive learning of medical visual representations from paired images and text [Paper] [Code]
Gloria: A multimodal global-local representation learning framework for labelefficient medical image recognition [Paper] [Code]
MoCo-CXR: MoCo Pretraining Improves Representation and Transferability of Chest X-ray Models [Paper] [Code]
Transunet: Transformers make strong encoders for medical image segmentation [Paper] [Code]
Transfuse: Fusing transformers and cnns for medical image segmentation [Paper] [Code]
Medical transformer: Gated axial-attention for medical image segmentation [Paper] [Code]
UNETR: Transformers for 3D Medical Image Segmentation [Paper] [Code]
Cotr: Efficiently bridging cnn and transformer for 3d medical image segmentation [Paper] [Code]
Swin-unet: Unet-like pure transformer for medical image segmentation [Paper] [Code]
SAM4Med: Generalist Vision Foundation Models for Medical Imaging: A Case Study of Segment Anything Model on Zero-Shot Medical Segmentation [Paper]

Medical Informatics

Med-PaLM 2: Towards Expert-Level Medical Question Answering with Large Language Models [Paper]
DeID-GPT: Zero-shot Medical Text De-Identification by GPT-4 [Paper]
Capabilities of GPT-4 on Medical Challenge Problems [Paper]
BioBERT: a pre-trained biomedical language representation model for biomedical text mining [Paper]
Publicly Available Clinical BERT Embeddings [Paper]
BioMegatron: Larger Biomedical Domain Language Model [Paper]
Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks [Paper]
Med-BERT: pretrained contextualized embeddings on large-scale structured electronic health records for disease prediction [Paper]
CPLLM: Clinical Prediction with Large Language Models [Paper] [Code]
BioELECTRA:Pretrained Biomedical text Encoder using Discriminators [Paper]
LinkBERT: Pretraining Language Models with Document Links [Paper]
BioGPT: Generative Pre-trained Transformer for Biomedical Text Generation and Mining [Paper]
Large Language Models Encode Clinical Knowledge [Paper]
A large language model for electronic health records [Paper]
Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing [Paper]
BEHRT: Transformer for Electronic Health Records [Paper]
Federated Learning of Medical Concepts Embedding using BEHRT [Paper] [Code]

Medical Education

GPT-4 Technical Report [Paper]
Empowering Beginners in Bioinformatics with ChatGPT [Paper]

Public Health

Dietary Assessment with Multimodal ChatGPT: A Systematic Analysis [Paper]
Expert-level detection of pathologies from unannotated chest X-ray images via self-supervised learning [Paper]
Clustering Egocentric Images in Passive Dietary Monitoring with Self-Supervised Learning [Paper]
ClimaX: A foundation model for weather and climate [Paper]

Medical Robotics

EndoFM: Foundation Model for Endoscopy Video Analysis via Large-scale Self-supervised Pre-train [Paper] [Code]
Decision Transformer: Reinforcement Learning via Sequence Modeling [Paper] [Code]
R3M: A Universal Visual Representation for Robot Manipulation [Paper] [Project] [Code]
MimicPlay: Long-Horizon Imitation Learning by Watching Human Play [Paper] [Project]
PaLM-E: An Embodied Multimodal Language Model [Paper] [Project] [Blog]
A Generalist Agent [Paper] [Blog]
CLIPort: What and Where Pathways for Robotic Manipulation [Paper] [Project] [Code]
Perceiver-Actor: A Multi-Task Transformer for Robotic Manipulation [Paper] [Project] [Code]
Do As I Can, Not As I Say: Grounding Language in Robotic Affordances [Paper] [Project] [Code]
VIMA: General Robot Manipulation with Multimodal Prompts [Paper] [Project] [Code]
RT-1: Robotics Transformer for Real-World Control at Scale [Paper] [Project] [Code]
ChatGPT for Robotics: Design Principles and Model Abilities [Paper] [Blog] [Code]

AI Legislation

AI Act (EU) [Source]
A pro-innovation approach to AI regulation (UK) [Source]
Blueprint for an AI Bill of Rights (USA) [Source]
AI Risk Management Framework (USA) [Source]
Provisions on the Administration of Deep Synthesis Internet Information Services (China) [Source]
Interim Measures for the Management of Generative Artificial Intelligence Services (China) [Source]

Large-scale Datasets in Biomedical and Health Informatics

Open Source

Dataset	Description
Big Fantastic Datasbase	2.1 B protein sequences, 393 B amino acids
Observed Antibody Space	558 M antibody sequences
RNAcentral	34 M ncRNA sequences, 22 M secondary structure
ZINC20	1.4B compounds from 310 catalogs from 150 companies
MIMIC-CXR	65K patients, 337K chest X-ray images and 227K radiology reports
MedMNIST v2	708K 2D medical images, 10K 3D medical images
Medical Meadow	1.5M data points containing a wide range of medical language processing tasks
Endo-FM database	33K endoscopic videos, up to 5M frames
SurgVLP database	25K laparoscopic video-text pairs from 1k surgical lecture videos

Private or Upon Approval

Dataset	Description
Mount Sinai ECG Data	2.1 M patients, containing 8.5 M discrete ECG recordings
Google DR Dev. Dataset	239 K unique individuals, 1.6 M fundus images
UF Health IDR Clinical Note Database	290 M clinical notes, with up to 82 B medical words
Clinical Practice Research Datalink	11.3 M patients covering data on demographics, symptoms, diagnoses, etc

Awesome

Awesome-Healthcare-Foundation-Models

News

Table of Contents

Survey

Large Language Models

Healthcare Domain

General Domain

Large Vision Models

Healthcare Domain

General Domain

Large Audio Models

Healthcare Domain

General Domain

Large Multi-modal Models

Healthcare Domain

General Domain

Applications of Large AI Models in Healthcare

Bioinformatics

Medical Diagnosis

Medical Imaging

Medical Informatics

Medical Education

Public Health

Medical Robotics

AI Legislation

Large-scale Datasets in Biomedical and Health Informatics

Open Source

Private or Upon Approval