Home

Awesome

[Model/Code] PathoDuet: Foundation Models for Pathological Slide Analysis of H&E and IHC Stains

<!-- select Model and/or Data and/or Code as needed> ### Welcome to OpenMEDLab! šŸ‘‹ <!-- **Here are some ideas to get you started:** šŸ™‹ā€ā™€ļø A short introduction - what is your organization all about? šŸŒˆ Contribution guidelines - how can the community get involved? šŸ‘©ā€šŸ’» Useful resources - where can the community find your docs? Is there anything else the community should know? šŸæ Fun facts - what does your team eat for breakfast? šŸ§™ Remember, you can do mighty things with the power of [Markdown](https://docs.github.com/github/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax) --> <!-- Insert the project banner here --> <div align="center"> <a href="https://"><img width="1000px" height="auto" src="https://github.com/openmedlab/PathoDuet/blob/main/banner.png"></a> </div>
<!-- Select some of the point info, feel free to delete -->

Updated on 2023.12.15. We have revolutionized PathoDuet! Now, the p2/p3 models are named HE/IHC models, and a more detailed figure about our work is updated! The paper is also done, and will be available on arXiv soon. The paper is available now.

Updated on 2023.08.04. Sorry for the late release. Now the p3 model is available! The paper link will be available in next update soon.

Key Features

This repository provides the official implementation of PathoDuet: Foundation Models for Pathological Slide Analysis of H&E and IHC Stains.

Key feature bulletin points here

Links

<!-- [Code] may link to your project at your institute> <!-- give a introduction of your project -->

Details

Our model is based on a new self-supervised learning (SSL) framework. This framework aims at exploiting characteristics of histopathological images by introducing a pretext token and following task raiser during the training. The pretext token is only a small piece of image, but contains special knowledge.

In task 1, cross-scale positioning, the pretext token is a small patch contained in a large region. The special relation inspires us to position this patch in the region and use the features of the region to generate the feature of the patch in a global view. The patch is also sent to the encoder solely to obtain a local-view feature. The two features are pulled together to strengthen the H&E model.

In task 2, cross-stain transferring, the pretext token is a small patch cropped from an image of one stain (H&E). The main input is the image of the other stain (IHC). These two images are roughly registered, so it is possible to style transfer one of them (H&E) to mimic the features of the other (IHC). The pseudo and real features are pulled together to obtain an IHC model on the basis of existing H&E model.

<!-- Insert a pipeline of your algorithm here if got one --> <div align="center"> <a href="https://"><img width="1000px" height="auto" src="https://github.com/openmedlab/PathoDuet/blob/main/overall.png"></a> </div>

Dataset Links

Get Started

Main Requirements

torch==1.12.1

torchvision==0.13.1

timm==0.6.7

tensorboard

pandas

Installation

git clone https://github.com/openmedlab/PathoDuet
cd PathoDuet

Download Model

If you just require a pretrain model for your own task, you can find our pretrained model weights here. We now provide you two versions of models.

You can try our model by the following codes.

from vits import VisionTransformerMoCo
# init the model
model = VisionTransformerMoCo(pretext_token=True, global_pool='avg')
# init the fc layer
model.head = nn.Linear(768, args.num_classes)
# load checkpoint
checkpoint = torch.load(your_checkpoint_path, map_location="cpu")
model.load_state_dict(checkpoint, strict=False)
# Your own tasks

Please note that considering the gap between pathological images and natural images, we do not use a normalize function in data augmentation.

Prepare Dataset

If you want to go through the whole process, you need to first prepare the training dataset. The H&E training dataset is cropped from TCGA, and should be arranged as

TCGA
ā”œā”€ā”€ TCGA-ACC
ā”‚   ā”œā”€ā”€ patch
ā”‚   ā”‚   ā”œā”€ā”€ 0_0_1.png
ā”‚   ā”‚   ā”œā”€ā”€ 0_0_2.png
ā”‚   ā”‚   ā””ā”€ā”€ ...
ā”‚   ā””ā”€ā”€ region
ā”‚       ā”œā”€ā”€ 0_0.png
ā”‚       ā”œā”€ā”€ 0_1.png
ā”‚       ā””ā”€ā”€ ...
ā”œā”€ā”€ TCGA-BRCA
ā”‚   ā”œā”€ā”€ patch
ā”‚   ā”‚   ā””ā”€ā”€ ...
ā”‚   ā””ā”€ā”€ region
ā”‚       ā””ā”€ā”€ ...
ā””ā”€ā”€ ...

To apply our data generating code, we recommend to install

openslide

The dataset in task 2 should be arranged like

root
ā”œā”€ā”€ Dataset1
ā”‚   ā”œā”€ā”€ HE
ā”‚   ā”‚   ā”œā”€ā”€ 001.png
ā”‚   ā”‚   ā”œā”€ā”€ a.png
ā”‚   ā”‚   ā””ā”€ā”€ ...
ā”‚   ā”œā”€ā”€ IHC1
ā”‚   ā”‚   ā”œā”€ā”€ 001.png
ā”‚   ā”‚   ā”œā”€ā”€ a.png
ā”‚   ā”‚   ā””ā”€ā”€ ...
ā”‚   ā””ā”€ā”€ IHC2
ā”‚       ā”œā”€ā”€ 001.png
ā”‚       ā”œā”€ā”€ a.png
ā”‚       ā””ā”€ā”€ ...
ā”œā”€ā”€ Dataset2
ā”‚   ā”œā”€ā”€ HE
ā”‚   ā”‚   ā””ā”€ā”€ ...
ā”‚   ā””ā”€ā”€ IHC
ā”‚       ā””ā”€ā”€ ...
ā””ā”€ā”€ ...

Training

The code is modified from MoCo v3.

For basic MoCo v3 training,

python main_moco.py \
  --tcga ./used_TCGA.csv \
  -a vit_base -b 2048 --workers 128 \
  --optimizer=adamw --lr=1.5e-4 --weight-decay=.1 \
  --epochs=100 --warmup-epochs=40 \
  --stop-grad-conv1 --moco-m-cos --moco-t=.2 \
  --multiprocessing-distributed --world-size 1 --rank 0 \
  --dist-backend nccl \
  --dist-url 'tcp://localhost:10001' \
  [your dataset folders]

For a further patch positioning pretext task,

python main_bridge.py \
  --tcga ./used_TCGA.csv \
  -a vit_base -b 2048 --workers 128 \
  --optimizer=adamw --lr=1.5e-4 --weight-decay=.1 \
  --epochs=20 --warmup-epochs=10 \
  --stop-grad-conv1 --moco-m-cos --moco-t=.2 --bridge-t=0.5 \
  --multiprocessing-distributed --world-size 1 --rank 0 \
  --dist-url 'tcp://localhost:10001' \
  --ckp ./phase2 \
  --firstphase ./checkpoint_0099.pth.tar \
  [your dataset folders]

For a further multi-stain reconstruction task,

python main_cross.py \
  -a vit_base -b 2048 --workers 128 \
  --optimizer=adamw --lr=1.5e-4 --weight-decay=.1 \
  --epochs=500 --warmup-epochs=100 \
  --stop-grad-conv1 \
  --multiprocessing-distributed --world-size 1 --rank 0 \
  --dist-url 'tcp://localhost:10001' \
  --ckp ./phase3 \
  --firstphase .phase2/checkpoint_0099.pth.tar \
  [your dataset folders]

Performance on Downstream Tasks

We provide performance evaluation on some downstream tasks, and compare our models with ImageNet-pretrained models (using weights of MoCo v3) and CTransPath. ImageNet has shown its great generalization ability in many pretrained models, so we choose MoCo v3's model as a baseline. CTransPath is also a pretrained model in pathology, which is based on 15 million patches from TCGA and PAIP. CTransPath has shown state-of-the-art performance on many pathological tasks of different diseases and sites.

Linear Evaluation

We use NCT-CRC-HE to evaluate the basic understanding of H&E images. We first follow the typical linear evaluation protocol used in SimCLR, which freezes all layers in the pretrained model and trains a newly-added linear layer from scratch. The result of CTransPath is copied from the original paper, and we also provide a reproduced one marked with a *.

MethodsBackboneACCF1
ImageNet-MoCo v3ViT-B/160.9350.908
CTransPathModified Swin Transformer0.9650.948
CTransPath*Modified Swin Transformer0.9560.932
Ours-HEViT-B/160.9640.950

Full Fine-tuning

In practice, pretrained models are not freezed. Therefore, we also unfreeze the pretrained encoder and finetune all parameters. It is noted that the performance of CTransPath is based on their open model.

MethodsBackboneACCF1
ImageNet-MoCo v3ViT-B/160.9580.945
CTransPathModified Swin Transformer0.9690.960
Ours-HEViT-B/160.9730.964

WSI Classification

For WSI classification, we reproduce the performance of CLAM-SB. Meanwhile, CTransPath filtered out some WSI in TCGA-NSCLC and TCGA-RCC due to some image quality consideration, so the performance of CTransPath is a reproduced one using their open model on the whole dataset, marked as CTP (Repro).

MethodsCAMELYON16: ACCCAMELYON16: AUCTCGA-NSCLC: ACCTCGA-NSCLC: AUCTCGA-RCC: ACCTCGA-RCC: AUC
CLAM-SB0.8840.9400.8940.9510.9290.986
CLAM-SB + CTP (Repro)0.8680.9400.9040.9560.9280.987
CLAM-SB + Ours-HE0.9300.9560.9080.9630.9540.993

PD-L1 Expression Level Assessment (IHC images)

Assessing IHC markers' expression levels is one of the primary tasks for pathologists to evaluate an IHC slide. We formulate this task as a 4-class classification task, with carefully selected thresholds. We compare our IHC model's performance with ImageNet-MoCo v3 and CTransPath as well. The metrics include accuracy (ACC), balanced accuracy (bACC) and weighted F1 score (wF1). Here, we give the performance with a limited amount of training data.

MethodsBackboneACCbACCwF1
ImageNet-MoCo v3ViT-B/160.6860.6980.695
CTransPathModified SwinT0.7000.7090.703
Ours-IHCViT-B/160.7260.7210.732

Cross-Site Tumor Identification (IHC images)

Tumor identification is also of great importance. We formulate this task as a 2-class classification task, with/without tumor cells in the given patch. The metrics include accuracy (ACC) and F1 score (F1). Here, we give the performance in the case of 1) an in-site setting, and 2) an out-of-distribution setting. In the first setting, we use a small group of data from site 1 to train the models in a linear protocol, and evaluate on another group of data from site 1. In the second setting, we train the models with more data in site 1, and evaluate on data from an unseen site 2.

MethodsBackboneACCF1ACC (OOD)F1 (OOD)
ImageNet-MoCo v3ViT-B/160.8640.8620.5040.503
CTransPathModified SwinT0.8720.8700.6770.657
Ours-IHCViT-B/160.9000.9000.8260.769

Comparison with Giant Pathological Models

We also compare our model to some giant models pretrained with ultra-large amounts of pathological slides, namely UNI and Virchow. We use the NCT-CRC-HE and NCT-CRC-HE-NONORM (marked with a *), and copy the results from Virchow. To note, the result of CTransPath is also a copy from Virchow, so it is slightly different from previous results reproduced by us, but the gap is as small as 0.001 or 0.002, which is acceptable as randomness. The training parameters are similar to Virchow's. but we change the batch size to 512 and a rescaled learning rate as 0.001/8=0.000125, and we use typical augmentations like random crop and scale, random flip and random rotation.

MethodsBackbone#WSIsACCbACCwF1ACC*bACC*wF1*
Ours-H&EViT-B~11K0.9640.9520.9640.8880.8750.894
CTransPathModified SwinT~32K0.9580.9310.9550.8790.8520.883
UNIViT-L~100K----0.8740.875
VirchowViT-H~1.5M0.9680.9560.9680.9480.9380.950

More Results can be found in our later released paper!

šŸ›”ļø License

This project is under the CC-BY-NC 4.0 license. See LICENSE for details.

šŸ™ Acknowledgement

šŸ“ Citation

If you find this repository useful, please consider citing our arXiv paper.

@misc{hua2023pathoduet,
      title={PathoDuet: Foundation Models for Pathological Slide Analysis of H&E and IHC Stains}, 
      author={Shengyi Hua and Fang Yan and Tianle Shen and Xiaofan Zhang},
      year={2023},
      eprint={2312.09894},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}