Home

Awesome

[Code] CITE: Connecting Image and Text Embeddings

<!-- select Model and/or Data and/or Code as needed --> <!-- **Here are some ideas to get you started:** 🙋‍♀️ A short introduction - what is your organization all about? 🌈 Contribution guidelines - how can the community get involved? 👩‍💻 Useful resources - where can the community find your docs? Is there anything else the community should know? 🍿 Fun facts - what does your team eat for breakfast? 🧙 Remember, you can do mighty things with the power of [Markdown](https://docs.github.com/github/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax) --> <!-- Insert the project banner here <div align="center"> <a href="https://"><img width="1000px" height="auto" src="assets/teaser.png"></a> </div> -->
<!-- Select some of the point info, feel free to delete --> <!-- [![Twitter](https://img.shields.io/twitter/url?style=social&url=https%3A%2F%2Ftwitter.com%2Fopendilab)](https://twitter.com/opendilab) [![PyPI](https://img.shields.io/pypi/v/DI-engine)](https://pypi.org/project/DI-engine/) ![Conda](https://anaconda.org/opendilab/di-engine/badges/version.svg) ![Conda update](https://anaconda.org/opendilab/di-engine/badges/latest_release_date.svg) ![PyPI - Python Version](https://img.shields.io/pypi/pyversions/DI-engine) ![PyTorch Version](https://img.shields.io/badge/dynamic/json?color=blue&label=pytorch&query=%24.pytorchVersion&url=https%3A%2F%2Fgist.githubusercontent.com/PaParaZz1/54c5c44eeb94734e276b2ed5770eba8d/raw/85b94a54933a9369f8843cc2cea3546152a75661/badges.json) ![Loc](https://img.shields.io/endpoint?url=https://gist.githubusercontent.com/HansBug/3690cccd811e4c5f771075c2f785c7bb/raw/loc.json) ![Comments](https://img.shields.io/endpoint?url=https://gist.githubusercontent.com/HansBug/3690cccd811e4c5f771075c2f785c7bb/raw/comments.json) ![Style](https://github.com/opendilab/DI-engine/actions/workflows/style.yml/badge.svg) ![Docs](https://github.com/opendilab/DI-engine/actions/workflows/doc.yml/badge.svg) ![Unittest](https://github.com/opendilab/DI-engine/actions/workflows/unit_test.yml/badge.svg) ![Algotest](https://github.com/opendilab/DI-engine/actions/workflows/algo_test.yml/badge.svg) ![deploy](https://github.com/opendilab/DI-engine/actions/workflows/deploy.yml/badge.svg) [![codecov](https://codecov.io/gh/opendilab/DI-engine/branch/main/graph/badge.svg?token=B0Q15JI301)](https://codecov.io/gh/opendilab/DI-engine) ![GitHub Org's stars](https://img.shields.io/github/stars/opendilab) [![GitHub stars](https://img.shields.io/github/stars/opendilab/DI-engine)](https://github.com/opendilab/DI-engine/stargazers) [![GitHub forks](https://img.shields.io/github/forks/opendilab/DI-engine)](https://github.com/opendilab/DI-engine/network) ![GitHub commit activity](https://img.shields.io/github/commit-activity/m/opendilab/DI-engine) [![GitHub issues](https://img.shields.io/github/issues/opendilab/DI-engine)](https://github.com/opendilab/DI-engine/issues) [![GitHub pulls](https://img.shields.io/github/issues-pr/opendilab/DI-engine)](https://github.com/opendilab/DI-engine/pulls) [![Contributors](https://img.shields.io/github/contributors/opendilab/DI-engine)](https://github.com/opendilab/DI-engine/graphs/contributors) [![GitHub license](https://img.shields.io/github/license/opendilab/DI-engine)](https://github.com/opendilab/DI-engine/blob/master/LICENSE) -->

Updated on 2023.12.26

Key Features

This repository provides the official implementation of Text-guided Foundation Model Adaptation for Pathological Image Classification.

Links

<!-- give a introduction of your project -->

Details

The recent surge of foundation models in computer vision and natural language processing opens up perspectives in utilizing multi-modal clinical data to train large models with strong generalizability. Yet pathological image datasets often lack biomedical text annotation and enrichment. Guiding data-efficient image diagnosis from the use of biomedical text knowledge becomes a substantial interest. In this paper, we propose to Connect Image and Text Embeddings (CITE) to enhance pathological image classification. CITE injects text insights gained from language models pre-trained with a broad range of biomedical texts, leading to adapt foundation models towards pathological image understanding. Through extensive experiments on the PatchGastric stomach tumor pathological image dataset, we demonstrate that CITE achieves leading performance compared with various baselines especially when training data is scarce. CITE offers insights into leveraging in-domain text knowledge to reinforce data-efficient pathological image classification.

An overview of CITE:

<!-- Insert a pipeline of your algorithm here if got one --> <div align="center"> <a href="https://"><img width="1000px" height="auto" src="assets/method.png"></a> </div>

Dataset

The PatchGastric dataset includes histopathological image patches extracted from H&E stained whole slide images (WSI) of stomach adenocarcinoma endoscopic biopsy specimens. The dataset contains 9 subtypes of gastric adenocarcinoma WSIs. We choose 3 major subtypes including “well differentiated tubular adenocarcinoma”, “moderately differentiated tubular adenocarcinoma”, and “poorly differentiated adenocarcinoma” to form a 3-class grading-like classification task with 179,285 patches of size 300x300 from 693 WSIs.

To prepare the PatchGastric dataset:

  1. Download captions.csv and patches_captions.zip from PatchGastricADC22.
  2. Put them in data/ and unzip the file.

Get Started

Main Requirements

torch==1.13.0
mmcls==0.25.0
transformers
clip

Installation

conda create -n CITE python=3.9
conda activate CITE
conda install pytorch==1.13.0 torchvision==0.14.0 torchaudio==0.13.0 pytorch-cuda=11.7 -c pytorch -c nvidia
pip install openmim
mim install mmcls==0.25.0
pip install -r requirements.txt

Preprocess

To follow our split of the dataset, please generate the annotation files by running:

python tools/ann.py

Or you can generate your own split following mmcls format:

filename label

Training

The config files follow mmcls style.

PYTHONPATH=.:$PYTHONPATH mim train mmcls <config>

Testing

PYTHONPATH=.:$PYTHONPATH mim test mmcls <config> --checkpoint <checkpoint> --metrics <metrics>

🙋‍♀️ Feedback and Contact

📝 Citation

@inproceedings{zhang2023text,
  title={Text-guided Foundation Model Adaptation for Pathological Image Classification},
  author={Zhang, Yunkun and Gao, Jin and Zhou, Mu and Wang, Xiaosong and Qiao, Yu and Zhang, Shaoting and Wang, Dequan},
  booktitle={MICCAI},
  year={2023}
}

🗃️ Materials

We provide a comprehensive overview of current open-source medical language models, vision foundation models, and vision-language models, illustrating their applicability to our approach (CITE). For BERT-based language models, you may directly replace model->head->text_encoder->model and model->neck->out_features with your preferred Huggingface🤗 model in the config file to run CITE.

Medical Language Models

ModelSubfieldPaperCodeBase
MeditronMedicineMeditron-70B: Scaling Medical Pretraining for Large Language ModelsGithubLLaMA 2
RadFMRadiologyTowards Generalist Foundation Model for RadiologyGithubLLaMA
BioMedGPTBiomedicineBioMedGPT: Open Multimodal Generative Pre-trained Transformer for BioMedicineGithubLLaMA 2
Med-PaLM 2ClinicTowards Expert-Level Medical Question Answering with Large Language ModelsGooglePaLM 2
PMC-LLaMAMedicinePMC-LLaMA: Towards Building Open-source Language Models for MedicineGithubLLaMA
BenTsao (HuaTuo)BiomedicineHuaTuo: Tuning LLaMA Model with Chinese Medical KnowledgeGithubLLaMA
ChatDoctorMedicineChatDoctor: A Medical Chat Model Fine-Tuned on a Large Language Model Meta-AI (LLaMA) Using Medical Domain KnowledgeGithubLLaMA
Clinical-T5ClinicClinical-T5: Large Language Models Built Using Mimic Clinical TextPhysioNetT5
Med-PaLMClinicLarge Language Models Encode Clinical KnowledgeGooglePaLM
BioGPTBiomedicineBioGPT: Generative Pre-Trained Transformer for Biomedical Text Generation and MiningGithubGPT-2
BioLinkBERTBiomedicineLinkbert: Pretraining Language Models with Document LinksGithubBERT
PubMedBERTBiomedicineDomain-Specific Language Model Pretraining for Biomedical Natural Language ProcessingMicrosoftBERT
BioBERTBiomedicineBioBERT: A Pre-Trained Biomedical Language Representation Model for Biomedical Text MiningGithubBERT
BlueBERTBiomedicineAn Empirical Study of Multi-Task Learning on BERT for Biomedical Text MiningGithubBERT
Clinical BERTClinicPublicly Available Clinical BERT EmbeddingsGithubBERT
SciBERTBiomedicineSciBERT: A Pretrained Language Model for Scientific TextGithubBERT

Vision Models

ModelSubfieldPaperCodeBase
REMEDISRadiologyRobust and Data-Efficient Generalization of Self-Supervised Machine Learning for Diagnostic ImagingGithubSimCLR
RETFoundRetinopathyA Foundation Model for Generalizable Disease Detection from Retinal ImagesGithubMAE
CTransPathPathologyTransformer-Based Unsupervised Contrastive Learning for Histopathological Image ClassificationGithub-
HIPTPathologyScaling Vision Transformers to Gigapixel Images via Hierarchical Self-Supervised LearningGithubDINO
INTERN-2.5GeneralInternImage: Exploring Large-Scale Vision Foundation Models with Deformable ConvolutionsGithub-
DINOv2GeneralDINOv2: Learning Robust Visual Features without SupervisionGithub-
MAEGeneralMasked Autoencoders are Scalable Vision LearnersGithub-
ViT (ImageNet)GeneralAn Image is Worth 16x16 Words: Transformers for Image Recognition at ScaleHuggingface-

Vision-Language Models

ModelSubfieldPaperCodeBase
Qilin-Med-VLRadiologyQilin-Med-VL: Towards Chinese Large Vision-Language Model for General HealthcareGithubLLaVA
RadFMRadiologyTowards Generalist Foundation Model for RadiologyGithub-
KADRadiologyKnowledge-Enhanced Visual-Language Pre-Training on Chest Radiology ImagesGithubCLIP
Med-FlamingoMedicineMed-Flamingo: A Multimodal Medical Few-Shot LearnerGithubFlamingo
QuiltNetPathologyQuilt-1M: One Million Image-Text Pairs for HistopathologyGithubCLIP
PLIPPathologyA Visual-Language Foundation Model for Pathology Image Analysis Using Medical TwitterHuggingfaceCLIP
MI-ZeroPathologyVisual Language Pretrained Multiple Instance Zero-Shot Transfer for Histopathology ImagesGithubCLIP
LLaVA-MedBiomedicineLLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One DayGithubLLaVA
MedVInTBiomedicinePMC-VQA: Visual Instruction Tuning for Medical Visual Question AnsweringGithub-
PMC-CLIPBiomedicinePMC-CLIP: Contrastive Language-Image Pre-Training Using Biomedical DocumentsGithubCLIP
BiomedCLIPBiomedicineLarge-Scale Domain-Specific Pretraining for Biomedical Vision-Language ProcessingHuggingfaceCLIP
MedCLIPMedicineMedCLIP: Contrastive Learning from Unpaired Medical Images and TextGithubCLIP
CheXzeroRadiologyExpert-Level Detection of Pathologies from Unannotated Chest X-ray Images via Self-Supervised LearningGithubCLIP
PubMedCLIPRadiologyDoes CLIP Benefit Visual Question Answering in the Medical Domain as Much as it Does in the General Domain?GithubCLIP
LLaVAGenearlVisual Instruction TuningGithub-
FlamingoGeneralFlamingo: a Visual Language Model for Few-Shot LearningOpenFlamingo-
CLIPGeneralLearning Transferable Visual Models From Natural Language SupervisionGithub-