Home

Awesome

Self-Supervised Vision Transformers Learn Visual Concepts in Histopathology

NOTE: Please see our follow-up work in CVPR 2022, which further extends this repository.

<details> <summary> <b>Self-Supervised Vision Transformers Learn Visual Concepts in Histopathology</b>, LMRL Workshop, NeurIPS 2021. <a href="https://www.lmrl.org" target="blank">[Workshop]</a> <a href="https://arxiv.org/abs/2203.00585" target="blank">[arXiv]</a> <br><em>Richard. J. Chen, Rahul G. Krishnan</em></br> </summary>
@article{chen2022self,
  title={Self-Supervised Vision Transformers Learn Visual Concepts in Histopathology},
  author={Chen, Richard J and Krishnan, Rahul G},
  journal={Learning Meaningful Representations of Life, NeurIPS 2021},
  year={2021}
}
</details> <div align="center"> <img width="80%" alt="DINO illustration" src=".github/Pathology_DINO.jpg"> </div>

Summary / Main Findings:

  1. In head-to-head comparison of SimCLR versus DINO, DINO learns more effective pretrained representations for histopathology - likely due to 1) not needing negative samples (histopathology has lots of potential class imbalance), 2) capturing better inductive biases about the part-whole hierarchies of how cells are spatially organized in tissue.
  2. ImageNet features do lag behind SSL methods (in terms of data-efficiency), but are better than you think on patch/slide-level tasks. Transfer learning with ImageNet features (from a truncated ResNet-50 after 3rd residual block) gives very decent performance using the CLAM package.
  3. SSL may help mitigate domain shift from site-specific H&E stainining protocols. With vanilla data augmentations, global structure of morphological subtypes (within each class) are more well-preserved than ImageNet features via 2D UMAP scatter plots.
  4. Self-supervised ViTs are able to localize cell location quite well w/o any supervision. Our results show that ViTs are able to localize visual concepts in histopathology in introspecting the attention heads.

Updates

Pre-Reqs

We use Git LFS to version-control large files in this repository (e.g. - images, embeddings, checkpoints). After installing, to pull these large files, please run:

git lfs pull

Pretrained Models

SIMCLR and DINO models were trained for 100 epochs using their vanilla training recipes in their respective papers. These models were developed on 2,055,742 patches (256 x 256 resolution at 20X magnification) extracted from diagnostic slides in the TCGA-BRCA dataset, and evaluated via K-NN on patch-level datasets in histopathology.

Note: Results should be taken-in w.r.t. to the size of dataset and duraration of training epochs. Ideally, longer training with larger batch sizes would demonstrate larger gains in SSL performance.

<table> <tr> <th>Arch</th> <th>SSL Method</th> <th>Dataset</th> <th>Epochs</th> <th>Dim</th> <th>K-NN</th> <th>Download</th> </tr> <tr> <td>ResNet-50</td> <td>Transfer</td> <td>ImageNet</td> <td>N/A</td> <td>1024</td> <td>0.935</td> <td>N/A</td> </tr> <tr> <td>ResNet-50</td> <td><a href="https://github.com/google-research/simclr">SimCLR</a></td> <td>TCGA-BRCA</td> <td>100</td> <td>2048</td> <td>0.938</td> <td><a href="https://github.com/Richarizardd/Self-Supervised-ViT-Path/blob/master/ckpts/resnet50_tcga_brca_simclr.pt">Backbone</a></td> </tr> <tr> <td>ViT-S/16</td> <td><a href="https://github.com/facebookresearch/dino">DINO</a></td> <td>TCGA-BRCA</td> <td>100</td> <td>384</td> <td>0.941</td> <td><a href="https://github.com/Richarizardd/Self-Supervised-ViT-Path/blob/master/ckpts/vits_tcga_brca_dino.pt">Backbone</a></td> </tr> </table>

Data Download + Data Preprocessing

For CRC-100K and BreastPathQ, pre-extracted embeddings are already available and processed in ./embeddings_patch_library. See patch_extraction_utils.py on how these patch datasets were processed.

Additional Datasets + Custom Implementation: This codebase is flexible for feature extraction on a variety of different patch datasets. To extend this work, simply modify patch_extraction_utils.py with a custom Dataset Loader for your dataset. As an example, we include BCSS (results not yet updated in this work).

Evaluation: K-NN Patch-Level Classification on CRC-100K + BreastPathQ

Run the notebook patch_extraction.ipynb, followed by patch_evaluation.ipynb. The evaluation notebook should run "out-of-the-box" with Git LFS.

<div align="center"> <img width="80%" alt="table2" src=".github/table2.jpg"> </div>

Evaluation: Slide-Level Classification on TCGA-BRCA (IDC versus ILC)

Install the CLAM Package, followed by using the 10-fold cross-validation splits made available in ./slide_evaluation/10foldcv_subtype/tcga_brca. Tensorboard train + validation logs can visualized via:

tensorboard --logdir ./slide_evaluation/results/
<div align="center"> <img width="80%" alt="table1" src=".github/table1.jpg"> </div>

Visualization: Creating UMAPs

Install umap-learn (can be tricky to install if you have incompatible dependencies), followed by using the following code snippet in patch_extraction_utils.py, and is used in patch_extraction.ipynb to create Figure 4.

<div align="center"> <img width="100%" alt="UMAP" src=".github/umap.jpg"> </div>

Visualization: Attention Maps

Attention visualizations (reproducing Figure 3) can be performed via walking through the following notebook at attention_visualization_256.ipynb.

<div align="center"> <img width="90%" alt="Attention Visualization" src=".github/attention_visualization.png"> </div>

Issues

Acknowledgements, License & Usage

@article{chen2022self,
  title={Self-Supervised Vision Transformers Learn Visual Concepts in Histopathology},
  author={Chen, Richard J and Krishnan, Rahul G},
  journal={Learning Meaningful Representations of Life, NeurIPS 2021},
  year={2021}
}

@inproceedings{chen2022scaling,
  title={Scaling Vision Transformers to Gigapixel Images via Hierarchical Self-Supervised Learning},
  author={Chen, Richard J and Chen, Chengkuan and Li, Yicong and Chen, Tiffany Y and Trister, Andrew D and Krishnan, Rahul G and Mahmood, Faisal},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2022}
}

© This code is made available under the GPLv3 License and is available for non-commercial academic purposes.